Journal of South China University of Technology (Natural Science Edition) ›› 2009, Vol. 37 ›› Issue (4): 18-23,45.

• Computer Science & Technology • Previous Articles     Next Articles

Clustering Algorithm Based on Grid Density and Distance Information Characteristics

Dai Wei-di1  Zhang Lu2  Wang Wen-jun1  Hou Yue-xian1   

  1. 1. School of Computer Science and Technology, Tianjin University, Tianjin 300072, China; 2. School of Software, Tianjin University, Tianjin 300072, China
  • Received:2008-05-12 Revised:2008-07-04 Online:2009-04-25 Published:2009-04-25
  • Contact: 戴维迪(1977-),男,博士,副教授,主要从事数据挖掘、模式识别研究. E-mail:davidy@126.com
  • About author:戴维迪(1977-),男,博士,副教授,主要从事数据挖掘、模式识别研究.
  • Supported by:

    国家自然科学基金资助项目(60603027);天津市科技计划项目(08ZCKFGX01800,08ZCKFGX01600)

Abstract:

When disposing of a real data set with skewed data distribution using most grid- and density-based clustering algorithms, effective clustering cannot be obtained due to the monotonic search employed in the algorithms. In order to solve this problem, a new clustering algorithm GDD based on grid density and distance is proposed. In GDD, the data space is divided into many grid cells and a transition function related to the distance from the current clustering center is constructed. Then, the density transition ratios of grid cells in the local area are compared with the computed transition function values of the current grid cell to determine whether the current cluster should be extended. Moreover, by using a transition function, some experiments are made with real and synthetic data sets. The results show that the proposed algorithm which is insensitive to noise data, can discover clusters with arbitrary shape, with a time complexity linear to grid number, and that the algorithm is suitable for the clustering of real large-scale data sets.

Key words: clustering, density, grid, distance, transition function