Journal of South China University of Technology(Natural Science Edition) ›› 2022, Vol. 50 ›› Issue (6): 37-48,70.doi: 10.12141/j.issn.1000-565X.210124

Special Issue: 2022年计算机科学与技术

• Computer Science & Technology • Previous Articles     Next Articles

Product Feature Extraction Method Based on Seed Constraint-LDA

CHEN Kejia ZHENG Jingjing   

  1. School of Economics and Management,Fuzhou University,Fuzhou 350116,Fujian,China
  • Received:2021-03-10 Revised:2021-11-25 Online:2022-06-25 Published:2021-12-17
  • Contact: 陈可嘉 (1978-),男,博士,教授,主要从事文本挖掘、系统工程研究 E-mail:kjchen@ fzu. edu. cn
  • About author:陈可嘉 (1978-),男,博士,教授,主要从事文本挖掘、系统工程研究
  • Supported by:
    Supported by the National Natural Science Foundation of China (71701019) and the National Social Science
    Foundation of China (19BTQ072)

Abstract: In order to classify and extract product features from reviews, make reviews displayed separately according to different product features, and improve the efficiency of making purchasing decisions for consumers, this paper proposes a product feature extraction method based on SC-LDA(Seed Constraint-Latent Dirichlet Allocation). Firstly, the TF-IDF (Term Frequency–Inverse Document Frequency) algorithm is used to automatically extract the keywords as a feature seed set. Secondly, document reorganization is adopted to solve the problem of multi-feature co-occurrence of the long text as well as sparsity of the short one and improve the rate of document reorganization. Then, must-link and cannot-link seed constraints are applied to define the probability expansion and contraction value, which affects the topic allocation of the LDA model and makes the training results more reasonable. Finally, the topics generated by SC-LDA are mapped to the prior feature categories. The advantages of the proposed method are verified by carrying out qualitative analysis in terms of feature categories as well as feature words and quantitative analysis in terms of accuracy, entropy as well as purity.

Key words: feature extraction, LDA, seed constraint, document reorganization, feature category mapping

CLC Number: