华南理工大学学报(自然科学版) ›› 2017, Vol. 45 ›› Issue (1): 112-117.doi: 10.3969/j.issn.1000-565X.2017.01.016

• 计算机科学与技术 • 上一篇    下一篇

改进的面向流程挖掘的日志数据融合方法

徐杨 林琪 李东   

  1. 华南理工大学 软件学院,广东 广州 510006
  • 收稿日期:2016-03-02 修回日期:2016-10-12 出版日期:2017-01-25 发布日期:2016-12-01
  • 通信作者: 李东( 1970-) ,男,教授,博士生导师,主要从事数据库、移动计算、大数据技术研究 E-mail:cslidong@scut.edu.cn
  • 作者简介:徐杨( 1970-) ,男,博士,讲师,主要从事业务流程建模、并行与分布式计算研究.E-mail: xuyang@ scut.edu.cn
  • 基金资助:

    国家自然科学基金资助项目( 71090403 ) ; 广东省科技计划项目( 2014B090901001,2015B010103002, 2016B090918062) ; 广州市科技计划项目( 201604010127) ; 华南理工大学“985 工程”软件学院学科建设引导经费专项 ( x2rjD615015III)

Improved Log Data-Merging Method for Process Mining

XU Yang LIN Qi LI Dong   

  1. School of Software Engineering,South China University of Technology,Guangzhou 510006,Guangdong,China
  • Received:2016-03-02 Revised:2016-10-12 Online:2017-01-25 Published:2016-12-01
  • Contact: 李东( 1970-) ,男,教授,博士生导师,主要从事数据库、移动计算、大数据技术研究 E-mail:cslidong@scut.edu.cn
  • About author:徐杨( 1970-) ,男,博士,讲师,主要从事业务流程建模、并行与分布式计算研究.E-mail: xuyang@ scut.edu.cn
  • Supported by:
    Supported by the National Natural Science Foundation of China( 71090403) and the Science and Technology Planning Projects of Guangdong Province( 2014B090901001, 2015B010103002, 2016B090918062)

摘要: 现有的流程挖掘技术和工具都是针对单个日志文件,但在实际业务环境中,一个业务流程的执行往往需要多个信息系统共同支撑,信息系统产生的流程日志数据将被分
布在不同的日志文件中,有必要对这些分散的日志数据进行融合,以供对全局流程的挖掘分析.文中提出了一种基于模拟退火与人工免疫混合算法的日志融合方法.该方法针对跨IT 系统的流程日志特点,在亲和度计算中考虑了流程路径出现频次和实例时间重叠区域两个算子,以提高匹配实例的准确度和方法的实用价值; 在种群进化中引入模拟退火选择思想,以解决人工免疫算法早熟和持续退化的问题,并加入了记忆库机制,加强每一代种群的多样性保持,避免种群局部收敛.实验结果表明: 文中方法的日志融合成功率达90%以上,能保证流程挖掘结果的正确性; 与传统基于人工免疫的日志融合方法相比,文中方法的收敛速度明显提升,提高了融合效率.

关键词: 流程挖掘, 日志融合, 人工免疫算法, 模拟退火

Abstract:

The existing process mining techniques and tools are on the basis of a single log file.In actual business process environment,however,a business process may be supported by different computer systems,so that actual process data will be recorded into multiple log files.Therefore,it is necessary to merge the multiple recorded data into one log file for further global process mining and analysis.In this paper,an automatic method is proposed to merge event logs by combining an artificial immune algorithm and simulated annealing.In the method,on the basis of the characteristics of the process logs of multiple IT systems,two operators,namely,the occurrence frequency of activity sequences and the time overlap area between mergeable cases,are taken into account in an affinity function,so as to improve the accuracy of matching cases and the practicality of the proposed method.Moreover,the simulated annealing selection is introduced into the evolution of populations so as to solve the problems of the premature and continuous degradation of artificial immune algorithm,and the immunological memory is also introduced to preserve the diversity of populations and avoid their local convergence.Experiment results show that the proposed method achieves a merging success rate of more than 90%,and it can ensure that process mining results are correct,and that,as compared with the traditional log data-merging method on the basis of artificial immunity,the proposed method speeds up convergence significantly and increases merging efficiency.

Key words: process mining, log data merging, artificial immune algorithm, simulated annealing