Computer Science & Technology

Improved DPC Clustering Algorithm with Neighbor Density Distribution Optimized Sample Assignment
 

Expand
  •  1. School of Computer Science and Technology,Anhui University,Hefei 230601,Anhui,China; 2. Key Laboratory of Intelligent Computing and Signal Processing of the Ministry of Education,Anhui University,Hefei 230039,Anhui,China
纪霞( 1982-) ,女,博士,讲师,主要从事数据挖掘、机器学习和智能信息处理研究.

Received date: 2017-12-14

  Revised date: 2018-06-25

  Online published: 2019-01-02

Supported by

 Supported by the National Natural Science Foundation of China( 61602004, 61672034) , the Natural Science Foundation of Anhui Province( 1708085MF160, 1508085MF127, 1408085MF122) , the Key Research and Development Program of Anhui Province( 1804d8020309) and the Natural Science Foundation of Anhui Higher Education Institutions ( KJ2016A041, KJ2017A011) 

Abstract

DPC algorithm is a new density based clustering algorithm that can automatically determine the number of clusters and cluster centers. However, there is a defect in the stability of clustering quality in the sample allocation strategy. KNN-DPC,an improved algorithm of DPC,has better clustering effect,but its practicality is affected by the low efficiency. In order to overcome the deficiencies of DPC algorithm and KNN-DPC algorithm,a neighbor density distribution optimized DPC clustering algorithm was proposed. Firstly, the algorithm searched and found the cluster centers with DPC algorithm. Then, two sample allocation strategies were adopted based on the neighbor density distribution of the sample,which was in turn used to assign the rest samples to the corresponding cluster. Theoretical analysis and the thorough experiments on several popular test cases include synthetic datasets and real-world datasets from UCI machine learning repository show that the clustering algorithm proposed can quickly determine the cluster center of arbitrary shape data and effectively perform sample cluster allocation. Compared with DPC algorithm and KNN-DPC algorithm, the proposed algorithm has a better balance between clustering effect and time performance,and has high stability. The algorithm proposed is an effective adaptive clustering algorithm that can be applied to largescale data sets. 

Cite this article

JI Xia ZHANG Tao ZHU Jianlei LIU Shicheng LI Xuejun .

Improved DPC Clustering Algorithm with Neighbor Density Distribution Optimized Sample Assignment
 
[J]. Journal of South China University of Technology(Natural Science), 2019 , 47(2) : 98 -105 . DOI: 10.12141/j.issn.1000-565X.170550

References

 
Outlines

/