华南理工大学学报(自然科学版) ›› 2004, Vol. 32 ›› Issue (2): 58-61.

• • 上一篇    下一篇

交互式数据迁移系统及其相似检测效率优化

陈伟 丁秋林 谢强   

  1. 南京航空航天大学 计算机应用研究所‚江苏 南京210016
  • 收稿日期:2003-06-18 出版日期:2004-02-20 发布日期:2015-09-07
  • 通信作者: 陈伟(1976-)‚男‚博士生‚主要从事数据清理和企业信息化等的研究. E-mail:chenweich@163.net
  • 作者简介:陈伟(1976-)‚男‚博士生‚主要从事数据清理和企业信息化等的研究.

Interactive Data Migration System and Its Approximately-detecting Efficiency Optimization

Chen Wei Ding Qiu-lin Xie Qiang   

  1. Computer Application Institute‚Nanjing Univ.of Aeronautics and Astronautics‚Nanjing210016‚Jiangsu‚China
  • Received:2003-06-18 Online:2004-02-20 Published:2015-09-07
  • Contact: 陈伟(1976-)‚男‚博士生‚主要从事数据清理和企业信息化等的研究. E-mail:chenweich@163.net
  • About author:陈伟(1976-)‚男‚博士生‚主要从事数据清理和企业信息化等的研究.

摘要: 为保证数据迁移后新系统的数据质量‚把数据清理应用于数据迁移之中‚提出一种集成数据清理的交互式数据迁移系统‚并分析其工作原理.为了提高该系统中相似重复记录的检测效率‚在相似重复记录检测中采用长度过滤等方法优化相似检测算法‚避免了不必要的编辑距离计算‚从而提高了整个数据迁移系统的数据迁移速度.此外‚构造了合适的实验环境‚作了大量的检测实验‚实验结果验证了长度过滤方法的科学性.

关键词: 数据迁移, 数据质量, 数据清理, 相似检测, 长度过滤

Abstract: Data cleaning technology was used to ensure the data quality of the system after data migration.Thus an interactive data migration system combined with data cleaning was proposed.The working principle of the system was then analyzed.By using the length filtration method in approximately-dupli cated-record detection‚the approximately-detecting algorithm was optimized to improve the detection efficiency.As a result‚the unnecessary editing distance computation was avoided‚which brings about an
improvement in the data migration speed of the whole interactive data migration system.Furthermore‚an appropriate experimental environment was created so that a lot of detection experiments could be carried out.Experimental results have proved the rationality of the length filtration method.

Key words: data migration, data quality, ata cleaning, approximately-detecting, length filtration

中图分类号: