华南理工大学学报(自然科学版) ›› 2021, Vol. 49 ›› Issue (1): 47-57.doi: 10.12141/j.issn.1000-565X.200210

所属专题: 2021年计算机科学与技术

• 计算机科学与技术 • 上一篇    下一篇

面向非随机缺失数据的协同过滤评分方法

古万荣1 谢贤芬2 张子烨3 毛宜军1† 梁早清1 何亦琛1   

  1. 1. 华南农业大学 数学与信息学院,广东 广州 510642; 2. 暨南大学 经济学院,广东 广州 510632; 3. 华南理工大学 数学学院,广东 广州 510640
  • 收稿日期:2020-05-06 修回日期:2020-06-17 出版日期:2021-01-25 发布日期:2021-01-01
  • 通信作者: 毛宜军 ( 1979-) ,男,博士,讲师,硕士生导师,主要从事大数据分析和生物信息学研究。 E-mail:yijunmao@163.com
  • 作者简介:古万荣 ( 1982-) ,男,博士,讲师,硕士生导师,主要从事互联网大数据处理与分析和推荐模型研究。E-mail: guwanrong@scau.edu.cn
  • 基金资助:
    国家重点研发计划项目 ( 2017YFC1601701) ; 广东省科技计划项目 ( 2018A070712021) ; 国家统计科学研究重 点项目 ( 2019LZ37) ; 广东省哲学社会科学规划项目 ( GD18CXW01,GD19CGL34)

Collaborative Score Prediction Method for Non-Random Missing Data

GU Wanrong1 XIE Xianfen2 ZHANG Ziye3 MAO Yijun1 LIANG Zaoqing1 HE Yichen1   

  1. 1. School of Mathematics and Information,South China Agricultural University,Guangzhou 510642,Guangdong,China; 2. School of Economics,Jinan University,Guangzhou 510632,Guangdong,China; 3. School of Mathematics,South China University of Technology,Guangzhou 510640,Guangdong,China
  • Received:2020-05-06 Revised:2020-06-17 Online:2021-01-25 Published:2021-01-01
  • Contact: 毛宜军 ( 1979-) ,男,博士,讲师,硕士生导师,主要从事大数据分析和生物信息学研究。 E-mail:yijunmao@163.com
  • About author:古万荣 ( 1982-) ,男,博士,讲师,硕士生导师,主要从事互联网大数据处理与分析和推荐模型研究。E-mail: guwanrong@scau.edu.cn
  • Supported by:
    Supported by the National Key R&D Program of China ( 2017YFC1601701) ,the Science and Technology Planning Project of Guangdong Province ( 2018A070712021) ,the National Key Project of Statistical Science Research ( 2019LZ37) and the Philosophy and Social Sciences Planning Project of Guangdong Province ( GD18CXW01,GD19CGL34)

摘要: 大多数评分预测研究都是基于缺失值是随机的假设。然而,实际的线上推荐系 统的评分矩阵的缺失数据都是非随机的。对缺失数据的错误假设会导致有偏差的参数估 计和预测。为了提高非随机缺失评分矩阵填补的准确度,文中深入分析了用户和物品的 评分矩阵的内在原理,提出了通过行或列变换将用户和物品的评分矩阵转变为等价的双 边块对角矩阵,再在不同的分区块中分别应用矩阵分解方法进行分解和评分预测的方 法,使得局部数据更新和分解成为现实。在公测数据集上的实验结果显示,文中方法可 以提高评分填补效果,有效地解决非随机评分缺失问题,从而提高推荐系统的预测准确 率。变换后的分块矩阵在分布式处理实验中也获得了较好的加速比,说明文中方法具有 较好的应用可扩展性。

关键词: 矩阵分解, 推荐系统, 奇异值分解, 评分预测

Abstract: Most score prediction studies are based on the assumption that the missing values are random. However, the missing data of the score matrix of the actual on-line recommendation system is non-random. Incorrect assumptions about the missing data can lead to biased parameter estimation and prediction. In order to improve the accuracy of non-random missing score matrix filling,the internal principle of user and item score matrix was analyzed in this paper. It presents a method to transform the score matrix of user and object into the equivalent bilateral block diagonal matrix by row or column transformation. Then the matrix decomposition method was applied to different blocks to decompose and predict the score,making local data update and decomposition become a reality. The experimental results on the public test dataset show that the proposed method can improve the score filling effect,solve the problem of non-random score missing effectively,and improve the prediction accuracy of the recommendation system. The improved block matrix also has a better speedup ratio in the distributed processing experiment,which shows that the proposed method has better scalability.

Key words: matrix decomposition, recommendation system, singular value decomposition, score prediction

中图分类号: