计算机科学与技术

基于最小熵正则化的半监督分类

展开
  • 1.华南理工大学 计算机科学与工程学院, 广东 广州 510006;   2.华南理工大学 理学院, 广东 广州 510640; 3.广东工业大学 计算机学院, 广东 广州 510090; 4.华南理工大学 软件学院, 广东 广州 510006
刘小兰(1979-),女,博士生,讲师,主要从事人工智能、机器学习研究.

收稿日期: 2008-12-30

  修回日期: 2009-04-26

  网络出版日期: 2010-01-25

基金资助

广东省-教育部产学研结合项目(2007B090400031);广东省科技计划项目(2008B080701005)

Semi-Supervised Classification Based on Regularization of Minimum Entropy

Expand
  • 1. School of Computer Science and Engineering, South China University of Technology, Guangzhou 510006, Guangdong, China; 2. School of Science, South China University of Technology, Guangzhou 510640, Guangdong, China; 3. Faculty of Computer, Guangdong University of Technology, Guangzhou 510090, Guangdong, China; 4. School of Software Engineering, South China University of Technology, Guangzhou 510006, Guangdong, China
刘小兰(1979-),女,博士生,讲师,主要从事人工智能、机器学习研究.

Received date: 2008-12-30

  Revised date: 2009-04-26

  Online published: 2010-01-25

Supported by

广东省-教育部产学研结合项目(2007B090400031);广东省科技计划项目(2008B080701005)

摘要

首先分析了条件 Havrda-Charvat’s structural α-熵为什么是一个好的聚类标准,然后基于认识:一个好的聚类标准同时也是对无标记数据的一种好的刻画,提出了基于最小熵正则化的半监督分类模型,并用拟牛顿法对模型进行了求解。该算法既是判别式的,又是直推式的,从而降低了对模型的依赖程度,同时可以方便地预测训练集之外的示例的标记。在UCI数据库上的测试结果验证了该算法的有效性。

本文引用格式

刘小兰 郝志峰 杨晓伟 马献恒 . 基于最小熵正则化的半监督分类[J]. 华南理工大学学报(自然科学版), 2010 , 38(1) : 87 -91 . DOI: 10.3969/j.issn.1000-565X.2010.01.017

Abstract

As the generative model needs modelling complex joint probability density and evaluating many parameters, a discriminant semi-supervised classification algorithm based on the regularization of minimum entropy is proposed. This algorithm uses Havrda-Charvat's structural α-entropy as the regularization item of the objective and employs the quasi-Newton method to solve the objective, which makes the algorithm discriminative and inductive and reduces the dependence of the algorithm on the model. At the same time, the algorithm can predict the labels of the out-of-sample data points easily. Simulated results of several UCI datasets demonstrate that the proposed algorithm is of low classification error even with few labeled data.

文章导航

/