Journal of South China University of Technology(Natural Science Edition) ›› 2019, Vol. 47 ›› Issue (2): 98-105.doi: 10.12141/j.issn.1000-565X.170550

• Computer Science & Technology • Previous Articles     Next Articles

Improved DPC Clustering Algorithm with Neighbor Density Distribution Optimized Sample Assignment
 

 JI Xia1, 2 ZHANG Tao1 ZHU Jianlei1 LIU Shicheng1 LI Xuejun1, 2    

  1.  1. School of Computer Science and Technology,Anhui University,Hefei 230601,Anhui,China; 2. Key Laboratory of Intelligent Computing and Signal Processing of the Ministry of Education,Anhui University,Hefei 230039,Anhui,China
  • Received:2017-12-14 Revised:2018-06-25 Online:2019-02-25 Published:2019-01-02
  • Contact: 纪霞( 1982-) ,女,博士,讲师,主要从事数据挖掘、机器学习和智能信息处理研究. E-mail:jixia1983@163.com
  • About author:纪霞( 1982-) ,女,博士,讲师,主要从事数据挖掘、机器学习和智能信息处理研究.
  • Supported by:
     Supported by the National Natural Science Foundation of China( 61602004, 61672034) , the Natural Science Foundation of Anhui Province( 1708085MF160, 1508085MF127, 1408085MF122) , the Key Research and Development Program of Anhui Province( 1804d8020309) and the Natural Science Foundation of Anhui Higher Education Institutions ( KJ2016A041, KJ2017A011) 

Abstract: DPC algorithm is a new density based clustering algorithm that can automatically determine the number of clusters and cluster centers. However, there is a defect in the stability of clustering quality in the sample allocation strategy. KNN-DPC,an improved algorithm of DPC,has better clustering effect,but its practicality is affected by the low efficiency. In order to overcome the deficiencies of DPC algorithm and KNN-DPC algorithm,a neighbor density distribution optimized DPC clustering algorithm was proposed. Firstly, the algorithm searched and found the cluster centers with DPC algorithm. Then, two sample allocation strategies were adopted based on the neighbor density distribution of the sample,which was in turn used to assign the rest samples to the corresponding cluster. Theoretical analysis and the thorough experiments on several popular test cases include synthetic datasets and real-world datasets from UCI machine learning repository show that the clustering algorithm proposed can quickly determine the cluster center of arbitrary shape data and effectively perform sample cluster allocation. Compared with DPC algorithm and KNN-DPC algorithm, the proposed algorithm has a better balance between clustering effect and time performance,and has high stability. The algorithm proposed is an effective adaptive clustering algorithm that can be applied to largescale data sets. 

Key words:

 

CLC Number: