Journal of South China University of Technology (Natural Science Edition) ›› 2017, Vol. 45 ›› Issue (3): 61-67.doi: 10.3969/j.issn.1000-565X.2017.03.009

• Computer Science & Technology • Previous Articles     Next Articles

Chinese Word Segmentation Method on the Basis of Bidirectional Long-Short Term Memory Model

ZHANG Hong-gang LI Huan   

  1. School of Information and Communication Engineering,Beijing University of Posts and Telecommunications,Beijing 100876,China
  • Received:2016-12-08 Online:2017-03-25 Published:2017-02-02
  • Contact: 张洪刚( 1974-) ,男,副教授,主要从事模式识别研究. E-mail:zhhg@bupt.edu.cn
  • About author:张洪刚( 1974-) ,男,副教授,主要从事模式识别研究.
  • Supported by:
    Supported by the National Natural Science Foundation of China for Young Scientists( 61601042)

Abstract: Chinese word segmentation is one of the fundamental technologies of Chinese natural language processing.At present,most conventional Chinese word segmentation methods rely on feature engineering,which requires intensive labor to verify the effectiveness.With the rapid development of deep learning,it becomes realistic to learn features automatically by using neural network.In this paper,on the basis of bidirectional long short-term memory ( BLSTM) model,a novel Chinese word segmentation method is proposed.In this method,Chinese characters are represented into embedding vectors from a large-scale corpus,and then the vectors are applied to BLSTM model for segmentation.It is found from the experiments without feature engineering that the proposed method is of high performance in Chinese word segmentation on simplified Chinese datasets ( PKU,MSRA and CTB) and traditional Chinese dataset ( HKCityU) .

Key words: deep leaning, neural network, long-short term memory, Chinese word segmentation