基于稀疏贝叶斯回归的正则化核密度估计算法

华南理工大学学报（自然科学版） ›› 2009, Vol. 37 ›› Issue (5): 123-129.

基于稀疏贝叶斯回归的正则化核密度估计算法

尹训福¹ 郝志峰²

1. 华南理工大学计算机科学与工程学院, 广东广州 510006；2. 广东工业大学计算机学院, 广东广州 510090

收稿日期:2008-07-24 修回日期:2008-12-27 出版日期:2009-05-25 发布日期:2009-05-25
通信作者: 尹训福（1979-），男，博士生，主要从事统计机器学习、核方法和信息论学习研究． E-mail:xunfuyin@yahoo．com.cn
作者简介:尹训福（1979-），男，博士生，主要从事统计机器学习、核方法和信息论学习研究．
基金资助:
国家自然科学基金资助项目（60433020,10471045）;广东省科技计划项目（20088080701005）;信息安全国家重点实验室开放课题基金资助项目（04一01）;惠州市技术研究与开发资金项目（08-117）

Regularized Kernel Density Estimation Algorithm Based on Sparse Bayesian Regression

Yin Xun-fu¹ Hao Zhi-feng²

1. School of Computer Science and Engineering, South China University of Technology, Guangzhou 510006, Guangdong, China; 2. Faculty of Computer, Guangdong University of Technology, Guangzhou 510090, Guangdong, China

Received:2008-07-24 Revised:2008-12-27 Online:2009-05-25 Published:2009-05-25
Contact: 尹训福（1979-），男，博士生，主要从事统计机器学习、核方法和信息论学习研究． E-mail:xunfuyin@yahoo．com.cn
About author:尹训福（1979-），男，博士生，主要从事统计机器学习、核方法和信息论学习研究．
Supported by:
国家自然科学基金资助项目（60433020,10471045）;广东省科技计划项目（20088080701005）;信息安全国家重点实验室开放课题基金资助项目（04一01）;惠州市技术研究与开发资金项目（08-117）

摘要/Abstract

摘要： 为了加快核密度估计（KDE）的计算速度，简化模型复杂度，提出了一种基于稀疏贝叶斯回归的KDE稀疏构造算法SBR—KDE．该算法将经人工加噪处理后的分布函数逼近值作为输入，获得了KDE的极为稀疏表示形式．实验结果表明：与传统KDE算法相比，在保持相当计算精度（多数情况下降低了模型误差）的情况下，文中算法的时空效率大幅度提高，而且在小样本训练集条件下得到的密度估计更光滑；独立成分分析及高斯化变换的应用使文中算法在一定程度上缓解了维数灾难．

关键词: 机器学习, 核密度估计, 贝叶斯回归, 不适定逆问题, 人工加噪正则化, 高斯化

Abstract:

In order to accelerate the computation of kernel density estimation （KDE） and to reduce the complexity of KDE model, a fast KDE algorithm based on sparse Bayesian regression is proposed. The algorithm takes the jittered approximation of the distribution function as the input and obtains the very sparse representation of KDE. Experimental results indicate that, as compared with the conventional KDE algorithm, the proposed algorithm results in a much smoother density estimation when training with a small sample set, and it remarkably improves the space-time efficiency with a comparative computational precision and with a reduced model error in most cases. Moreover, the applications of independent component analysis and Gaussianization to the proposed algorithm allevi- ate the curse of dimensionality to some extent.

Key words: machine learning, kernel density estimation, Bayesian regression, ill-posed inverse problem, jittering regularization, Gaussianization

尹训福郝志峰. 基于稀疏贝叶斯回归的正则化核密度估计算法[J]. 华南理工大学学报（自然科学版）, 2009, 37(5): 123-129.

Yin Xun-fu Hao Zhi-feng. Regularized Kernel Density Estimation Algorithm Based on Sparse Bayesian Regression[J]. Journal of South China University of Technology (Natural Science Edition), 2009, 37(5): 123-129.

[1]	周楚昊, 林培群, 闫明月. 基于自监督学习的交通数据补全算法[J]. 华南理工大学学报(自然科学版), 2023, 51(4): 101-114.
[2]	王晓飞, 李思雨, 陈迷, 等. 平凸曲线组合均衡性对公路安全性的影响[J]. 华南理工大学学报(自然科学版), 2022, 50(7): 76-84.
[3]	宋建, 王文龙, 李东, 等. 基于Stacking集成学习的注塑件尺寸预测方法[J]. 华南理工大学学报(自然科学版), 2022, 50(6): 19-26.
[4]	林培群夏雨周楚昊. 引入时空特征的高速公路行程时间预测方法[J]. 华南理工大学学报(自然科学版), 2021, 49(8): 1-11.
[5]	贾若, 戴昇宏, 黄霓, 等. 交通拥堵判别方法研究综述[J]. 华南理工大学学报（自然科学版）, 2021, 49(4): 124-139.
[6]	赵静, 王选仓, 樊振阳, 等. 基于支持向量机的沥青路面性能评价[J]. 华南理工大学学报（自然科学版）, 2020, 48(9): 116-123.
[7]	张子烨, 李明畅, 梁凌睿, 等. 推荐系统信息跨领域的改进迁移学习算法[J]. 华南理工大学学报（自然科学版）, 2020, 48(11): 99-106.
[8]	胡建军曹卓但雅波牛程程李想钱松荣. 基于特征选择和机器学习的材料弹性性能预测[J]. 华南理工大学学报（自然科学版）, 2019, 47(5): 48-55.
[9]	傅贵韩国强逯峰许子鑫. 支持向量机短时交通流预测应用研究[J]. 华南理工大学学报（自然科学版）, 2013, 41(9): 71-76.
[10]	许孝元韩国强闵华清. 预测型关联规则演化学习的适应值函数[J]. 华南理工大学学报（自然科学版）, 2005, 33(5): 1-6.