In order to classify and extract product features from reviews, make reviews displayed separately according to different product features, and improve the efficiency of making purchasing decisions for consumers, this paper proposes a product feature extraction method based on SC-LDA(Seed Constraint-Latent Dirichlet Allocation). Firstly, the TF-IDF (Term Frequency–Inverse Document Frequency) algorithm is used to automatically extract the keywords as a feature seed set. Secondly, document reorganization is adopted to solve the problem of multi-feature co-occurrence of the long text as well as sparsity of the short one and improve the rate of document reorganization. Then, must-link and cannot-link seed constraints are applied to define the probability expansion and contraction value, which affects the topic allocation of the LDA model and makes the training results more reasonable. Finally, the topics generated by SC-LDA are mapped to the prior feature categories. The advantages of the proposed method are verified by carrying out qualitative analysis in terms of feature categories as well as feature words and quantitative analysis in terms of accuracy, entropy as well as purity.
CHEN Ke-Jia
,
ZHENG Jing-Jing
. Product Attribute Extraction Method Based on Seed-Constraint-LDA[J]. Journal of South China University of Technology(Natural Science), 2022
, 50(6)
: 37
-48,70
.
DOI: 10.12141/j.issn.1000-565X.210124