High-Efficiency Text Clustering Algorithm Based on Semantic Distance

Journal of South China University of Technology (Natural Science Edition) ›› 2008, Vol. 36 ›› Issue (5): 30-37.

• Computer Science & Technology • Previous Articles Next Articles

High-Efficiency Text Clustering Algorithm Based on Semantic Distance

Feng Shao-rong Xiao Wen-jun

School of Computer Science and Engineering,South China University of Technology,Guangzhou 510640,Guangdong,China

Received:2007-06-27 Revised:2007-09-03 Online:2008-05-25 Published:2008-05-25
Contact: 冯少荣（1964-），男，在职博士生，厦门大学副教授，主要从事并行分布数据库、数据仓库、数据挖掘方面的研究． E-mail:shaorong@xmu．edu．cn
About author:冯少荣（1964-），男，在职博士生，厦门大学副教授，主要从事并行分布数据库、数据仓库、数据挖掘方面的研究．
Supported by:
国家自然科学基金资助项目（50474033）

Abstract

Abstract:

As the existing text clustering algorithms overlook the semantic information between words and possess low calculation accuracy of text similarity,this paper proposes a new text clustering algorithm based on the semantic distance.In this method,the text is analyzed in terms of semantic,and the specific semantic of the text is used to calculate the similarity.Moreover,the nearest neighbor clustering algorithm is adopted,and a second clustering algorithm is presented to overcome the sensitivity of the nearest neighbor clustering algorithm to the input order of the text.According to the similarity weight,some feature words representing the cluster are chosen,which makes the remained feature words similar to the themes of the cluster.Experimental results indicate that the proposed algorithm is of higher clustering precision and recall rate,as compared with the k-Means algorithm based on the vector space model.

Key words: text clustering, semantic distance, similarity, nearest neighbor clustering, clustering algorithm

Feng Shao-rong Xiao Wen-jun . High-Efficiency Text Clustering Algorithm Based on Semantic Distance[J]. Journal of South China University of Technology (Natural Science Edition), 2008, 36(5): 30-37.

[1]	CHEN Yanyan, WANG Zifan, SUN Haodong, et al. Stuby on the Activity Patterns and Regularity of Public Transport Passengers [J]. Journal of South China University of Technology(Natural Science Edition), 2023, 51(8): 40-50.
[2]	LIN Peiqun, GONG Minping, ZHOU Chuhao. User Portrait Method of Freeway Freight Car for Risk Identification of Freight Transportation [J]. Journal of South China University of Technology(Natural Science Edition), 2023, 51(6): 1-9.
[3]	GUO Enqiang, FU Xinsha. Dropped Object Detection Method Based on Feature Similarity Learning [J]. Journal of South China University of Technology(Natural Science Edition), 2023, 51(6): 30-41.
[4]	SU Jindian, HONG Xiaobin, YU Shanshan. Semantic Textual Similarity Justification based on Multi-Model Ensemble [J]. Journal of South China University of Technology(Natural Science Edition), 2022, 50(4): 1-9.
[5]	LIU Xiaolan, SHI Zongyu, YE Zehui, et al. Anchor Graph Based Low-Rank Incomplete Multi-View Subspace Clustering [J]. Journal of South China University of Technology(Natural Science Edition), 2022, 50(12): 60-70.
[6]	LIAO Yipeng, ZHANG Jin, CHEN Shiyuan, et al. Froth Collapse Rate Detection by the Fusion of FＲEAK and Omnidirectional Similarity in NSST Domain [J]. Journal of South China University of Technology(Natural Science Edition), 2020, 48(5): 92-101.
[7]	ZHENG Sifan, WANG Weixing, HE Zhanhua, et al. Research on Swing Amplitude Detection of Automobile Wiper with Two Granularity Optical Flow Manifold Learning#br# [J]. Journal of South China University of Technology (Natural Science Edition), 2020, 48(1): 123-132.
[8]	WANG Yanzhong, YANG Kai, QI Ronghua, et al. Similarity Model for Lubrication Experiment of Spiral Bevel Gear and Influencing Parameters [J]. Journal of South China University of Technology (Natural Science Edition), 2020, 48(1): 25-31.
[9]	LIU Xuemei LI Wen HUANG Guanda HUANG Tianlai YE Yuzhong XU Guohao. Feasibility Evaluation of Dissimilarity Algorithm and Adjacency Method for Constructing Evolutionary Tree [J]. Journal of South China University of Technology (Natural Science Edition), 2019, 47(6): 136-141,148.
[10]	CHEN Weiya PAN Xin FANG Xiaoping. Short-term Prediction of Passenger Flow on Bus Routes Based on K-means Clustering Combination Models [J]. Journal of South China University of Technology (Natural Science Edition), 2019, 47(4): 83-89,113.
[11]	高红霞陈展鸿曾润浩罗澜陈安马鸽. A Self-Adaptive Ｒestoration Algorithm for Image Corrupted with Strong Noise Based on Group Sparsity Ｒesidual Constraint [J]. Journal of South China University of Technology(Natural Science Edition), 2018, 46(8): 11-18.
[12]	ZHANG Li LI Bin TIAN Lian-fang LI Xiang-Xia. Multi-Modal Image Registration on the Basis of Local Structure Tensor-Mutual Information [J]. Journal of South China University of Technology (Natural Science Edition), 2017, 45(7): 98-106.
[13]	SUN Yan-jing LIU Dong-lin XIE Xin-xin WANG Yan-fen. Feature Similarity Image Quality Assessment on the Basis of Human Visual System [J]. Journal of South China University of Technology (Natural Science Edition), 2017, 45(3): 11-19.
[14]	HUANG Jiang-ping JI Dong-hong. Convolutional Network-Based Semantic Similarity Model of Sentences [J]. Journal of South China University of Technology (Natural Science Edition), 2017, 45(3): 68-75.
[15]	LIU Jing-shuai LIN Song WANG Han-chao ZHANG Yu. Synthesis of Belt Mechanism for Rigid-Body Guidance Based on Similarity Transformation [J]. Journal of South China University of Technology (Natural Science Edition), 2017, 45(10): 137-143.

High-Efficiency Text Clustering Algorithm Based on Semantic Distance

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

Comments