Journal of South China University of Technology (Natural Science Edition) ›› 2009, Vol. 37 ›› Issue (4): 37-41.

• Computer Science & Technology • Previous Articles     Next Articles

A Fast Regular Expression Set Matching Algorithm Based on Bloom Filter

Xu Ke-fu    Qi De-yu   Zheng Wei-ping   Qian Zheng-ping   

  1. Research Institute of Computer System, South China University of Technology, Guangzhou 510640, Guangdong, China
  • Received:2008-02-26 Revised:2008-05-21 Online:2009-04-25 Published:2009-04-25
  • Contact: 徐克付(1977-),男,博士生,主要从事网络信息安全、计算机系统体系结构研究. E-mail:xkfool@163.com
  • About author:徐克付(1977-),男,博士生,主要从事网络信息安全、计算机系统体系结构研究.
  • Supported by:

    中国博士后自然科学基金资助项目(2005037582);粤港关键领域重点突破项目(2005A10307007)

Abstract:

The effectiveness of the regular expression searching algorithms are proportional to the shortest path Lmin from the initial state to the final state of NFA and is inversely proportional to the prefix set Pref(RE) of the language that denotes the regular expression. In general, the elements in Pref(RE) are difficult to locate in the target text because the set of Pref(RE) is large. Proposed in this paper is a regular expression searching algorithm based on the Bloom Filter of which computation time to perform the query is independent of the string number. The proposed algorithm can fast locate Pref(RE) and perform a search with the speed immune from Pref(RE) , and, particularly, when multiple parallel Bloom Filters are employed, the algorithm may indirectly lengthen the shortest path. Analysis and experimental results indicate that the proposed algorithm greatly accelerates the search of regular expressions, especially for the search of an regular expression set, and that the searching speed increases several times and even up to tens of times when Lminand Pref(RE) values are both large. It is thus concluded that the proposed algorithm is suitable for the fast search of multiple regular expressions on a large scale.

Key words:regular expression matching; Bloom Filter; automaton ; pattern matching

Key words: Regular Expression, Bloom filter, Automaton, Fast Matching