Journal of South China University of Technology (Natural Science Edition) ›› 2018, Vol. 46 ›› Issue (8): 122-129.doi: 10.3969/j.issn.1000-565X.2018.08.017

• Computer Science & Technology • Previous Articles     Next Articles

Collaborative Learning of Word and Character Representation

 LIU Huiting1,2 LING Chao1, 2    

  1.  1. Key Laboratory of Intelligent Computing and Signal Processing of the Ministry of Education,Anhui University,Hefei 230039, Anhui,China; 2. School of Computer Science and Technology,Anhui University,Hefei 230601,Anhui,China
  • Received:2017-10-23 Revised:2018-02-04 Online:2018-08-25 Published:2018-07-01
  • Contact: 刘慧婷(1978-),女,副教授,主要从事自然语言处理研究. E-mail:htliu@ahu.edu.cn
  • About author:刘慧婷(1978-),女,副教授,主要从事自然语言处理研究.
  • Supported by:
    The National Natural Science Foundation of China(61202227)

Abstract: Abstract: Most word embedding models are based on the theory of distribution hypothesis, which take a word as a basic unit and infer word representation from its external contexts. However, in some languages similar to Chinese, a word is built from several characters and these characters contains rich internal information. The semantic of a word is closely related to the semantic of its composing characters. Therefore, this paper take Chinese for example and present two model to collaborative learn word and character representation. In order to solve the phenomenon of homonymy and polysemy, multiple-prototype character embeddings and an word selection method are proposed. We evaluate the proposed models on similarity tasks and analogy tasks. The results demonstrates the proposed models outperform other baseline models.

Key words: word representation, external contexts, internal information, collaborative learning

CLC Number: