Computer Science & Technology

Product Attribute Extraction Method Based on Seed-Constraint-LDA

  • CHEN Ke-Jia ,
  • ZHENG Jing-Jing
Expand
  • School of Economics and Management,Fuzhou University,Fuzhou 350116,Fujian,China
陈可嘉 (1978-),男,博士,教授,主要从事文本挖掘、系统工程研究

Received date: 2021-03-10

  Revised date: 2021-11-25

  Online published: 2021-12-15

Supported by

Supported by the National Natural Science Foundation of China (71701019) and the National Social Science
Foundation of China (19BTQ072)

Abstract

In order to classify and extract product features from reviews, make reviews displayed separately according to different product features, and improve the efficiency of making purchasing decisions for consumers, this paper proposes a product feature extraction method based on SC-LDA(Seed Constraint-Latent Dirichlet Allocation). Firstly, the TF-IDF (Term Frequency–Inverse Document Frequency) algorithm is used to automatically extract the keywords as a feature seed set. Secondly, document reorganization is adopted to solve the problem of multi-feature co-occurrence of the long text as well as sparsity of the short one and improve the rate of document reorganization. Then, must-link and cannot-link seed constraints are applied to define the probability expansion and contraction value, which affects the topic allocation of the LDA model and makes the training results more reasonable. Finally, the topics generated by SC-LDA are mapped to the prior feature categories. The advantages of the proposed method are verified by carrying out qualitative analysis in terms of feature categories as well as feature words and quantitative analysis in terms of accuracy, entropy as well as purity.

Cite this article

CHEN Ke-Jia , ZHENG Jing-Jing . Product Attribute Extraction Method Based on Seed-Constraint-LDA[J]. Journal of South China University of Technology(Natural Science), 2022 , 50(6) : 37 -48,70 . DOI: 10.12141/j.issn.1000-565X.210124

Outlines

/