基于多兴趣对比的深度强化学习推荐模型

doi:10.12141/j.issn.1000-565X.240088

华南理工大学学报(自然科学版) ›› 2025, Vol. 53 ›› Issue (9): 11-21.doi: 10.12141/j.issn.1000-565X.240088

基于多兴趣对比的深度强化学习推荐模型

刘慧婷¹^,², 刘绍雄¹, 王佳乐³, 赵鹏¹

^1.安徽大学计算机科学与技术学院，安徽合肥 230601
^2.合肥综合性国家科学中心人工智能研究院，安徽合肥 230088
^3.安徽大学纽约石溪学院，安徽合肥 230039

收稿日期:2024-02-27 出版日期:2025-09-25 发布日期:2025-04-27
作者简介:刘慧婷（1978—），女，博士，副教授，主要从事自然语言处理和个性化推荐研究。E-mail： htliu@ahu.edu.cn
基金资助:
国家自然科学基金项目(62576003);安徽省高校协同创新项目(GXXT-2022-040);安徽省自然科学基金项目(2008085MF219);安徽省自然科学基金项目(2108085MF212);安徽省高校自然科学研究项目(KJ2021-A0040);安徽省高校自然科学研究项目(KJ2021-A0043)

Deep Reinforcement Learning Recommendation Model Based on Multi-Interest Contrast

LIU Huiting¹^,², LIU Shaoxiong¹, WANG Jiale³, ZHAO Peng¹

^1.School of Computer Science and Technology，Anhui University，Hefei 230601，Anhui，China
^2.Institute of Artificial Intelligence，Hefei Comprehensive National Science Center，Hefei 230088，Anhui，China
^3.Stony Brook Institute，Anhui University，Hefei 230039，Anhui，China

Received:2024-02-27 Online:2025-09-25 Published:2025-04-27
About author:刘慧婷（1978—），女，博士，副教授，主要从事自然语言处理和个性化推荐研究。E-mail： htliu@ahu.edu.cn
Supported by:
the National Natural Science Foundation of China(62576003);the University Synergy Innovation Program of Anhui Province(GXXT-2022-040);the Natural Science Foundation of Anhui Province(2008085MF219);the Provincial Natural Science Foundation of Anhui Higher Education Institution of China(KJ2021-A0040)

摘要/Abstract

摘要：

深度强化学习（DRL）被广泛应用于推荐系统中，用于动态建模用户兴趣并最大化用户的累积收益。然而，用户反馈稀疏问题成为基于DRL的推荐算法面临的重要挑战之一。对比学习作为一种自监督学习方法，通过构造用户兴趣的多个视角增强其表示，进而缓解用户反馈稀疏问题。现有的对比学习方法通常利用基于启发式的增强策略，导致关键信息丢失，且未充分利用异构的交互信息。为解决这些问题，该文提出了基于多兴趣对比的深度强化学习推荐模型（MOCIR）。该模型包括一个对比表示模块和一个策略网络模块。对比表示模块利用异构信息网络（HIN）建模用户不同方面的局部兴趣，同时基于原始数据建模用户的全局兴趣，然后将同一用户的全局兴趣与局部兴趣、不同用户的全局兴趣与局部兴趣分别作为对比学习的正样本对和负样本对，以有效捕捉用户兴趣；策略网络模块用于在聚合用户状态表示后进行推荐；2个模块采用交替更新机制。在3个数据集上的实验结果表明，所提模型的推荐性能优于多个基于深度强化学习的模型，有效地解决了推荐中用户反馈稀疏问题。

关键词: 多兴趣, 强化学习, 对比学习, 异质信息网络

Abstract:

Deep Reinforcement Learning (DRL) is widely applied in recommender systems to dynamically model user interests and maximize cumulative user benefits. However, the sparsity of user feedback has become a significant challenge for DRL-based recommendation algorithms. Contrastive learning, as a self-supervised learning method, enhances user interest representation by constructing multiple perspectives, thereby alleviating the issue of sparse user feedback. Existing contrastive learning methods typically rely on heuristic-based augmentation strategies, which often lead to the loss of key information and fail to fully utilize heterogeneous interaction data. To address these issues, this paper proposed a multi-interest oriented contrastive deep reinforcement learning recommendation (MOCIR) model. The model consists of two key modules: a contrastive representation module and a policy network module. The contrastive representation module utilizes a Heterogeneous Information Network (HIN) to model the user’s local interests from different aspects while capturing their global interests based on raw interaction data. It then treats the global and local interests of the same user as positive pairs and those of different users as negative pairs for contrastive learning, effectively enhancing user interest representation. The policy network module aggregates user state representations and generates recommendations. The two modules are trained using an alternating update mechanism. Experimental results on three benchmark datasets show that the proposed model outperforms several DRL-based models in recommendation performance, effectively addressing the problem of sparse user feedback in recommendations.

Key words: multi-interest, reinforcement learning, contrastive learning, heterogeneous information network

中图分类号:

TP391

刘慧婷, 刘绍雄, 王佳乐, 赵鹏. 基于多兴趣对比的深度强化学习推荐模型[J]. 华南理工大学学报(自然科学版), 2025, 53(9): 11-21.

LIU Huiting, LIU Shaoxiong, WANG Jiale, ZHAO Peng. Deep Reinforcement Learning Recommendation Model Based on Multi-Interest Contrast[J]. Journal of South China University of Technology(Natural Science Edition), 2025, 53(9): 11-21.

图/表 10

图1

图2

表1

表2

表3

图3

图4

图5

图6

图7

参考文献 28

[1]	马晓亮，高洁，刘英，等．基于意图理解驱动的客服知识推荐大模型构建［J］．华南理工大学学报（自然科学版），2025，53（3）：40-49.
	MA Xiaoliang， GAO Jie， LIU Ying，et al ．Customer Service knowledge recommendation large model construction driven by intent understanding［J］．Journal of South China University of Technology （Natural Science Edition），2025，53（3）：40-49.
[2]	LEE H， HWANG D， MIN K，et al ．Towards validating long-term user feedbacks in interactive recommendation systems［C］∥ Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval．Madrid：ACM，2022：2607-2611.
[3]	LI L H， CHU W， LANGFORD J，et al ．A contextual-bandit approach to personalized news article recommendation［C］∥ Proceedings of the 19th International Conference on World Wide Web．Raleigh：ACM，2010：661-670.
[4]	CHAPELLE O， LI L H ．An empirical evaluation of thompson sampling［J］．Advances in Neural Information Processing Systems，2011：2249-2257.
[5]	ZHAO X X， ZHANG W N， WANG J ．Interactive collaborative filtering［C］∥ Proceedings of the 22nd ACM International Conference on Information & Knowledge Management．San Francisco：ACM，2013：1411-1420.
[6]	CHEN H K， DAI X Y， CAI H，et al ．Large-scale interactive recommendation with tree-structured policy gradient［C］∥ Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence．Honolulu：AAAI，2019：3312-3320.
[7]	CHEN M M， BEUTEL A， COVINGTON P，et al ．Top-k off-policy correction for a REINFORCE recommender system［C］∥ Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining．Melbourne：ACM，2019：456-464.
[8]	LIU F， TANG R M， LI X T，et al ．Deep reinforcement learning based recommendation with explicit user-item interactions modeling［EB/OL］．（2019-10-29）［2024-03-01］．.
[9]	ZOU L X， XIA L， GU Y L，et al ．Neural interactive collaborative filtering［C］∥ Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval．New York：ACM，2020：749-758.
[10]	ZHAO X Y， ZHANG L， DING Z Y，et al ．Recommendations with negative feedback via pairwise deep reinforcement learning［C］∥ Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.London：ACM，2018：1040-1048.
[11]	ZHOU S J， DAI X Y， CHEN H N，et al ．Interactive recommender system via knowledge graph-enhanced reinforcement learning［C］∥ Proceedings of the 43rd International ACM SIGIR Conference on Research and Development In Information Retrieval．New York：ACM，2020：179-188.
[12]	CHEN T， KORNBLITH S， NOROUZI M，et al ．A simple framework for contrastive learning of visual representations［C］∥ Proceedings of the 37th International Conference on Machine Learning．［S.l.］：ML Research Press，2020：1597-1607.
[13]	HE K M， FAN H Q， WU Y X，et al ．Momentum contrast for unsupervised visual representation learning ［C］∥ Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition．Seattle：IEEE，2020：9729-9738.
[14]	CHENG C， YANG H， LYU M R，et al ．Where you like to go next：successive point-of-interest recommendation［C］∥ Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence．Beijing：AAAI，2013：2605-2611.
[15]	JI J C， ZHANG B J， YU J C，et al ．Relationship-aware contrastive learning for social recommendations ［J］．Information Sciences，2023，629：778-797.
[16]	XU C， ZHANG Y， CHEN H Y，et al ．A fairness-aware graph contrastive learning recommender framework for social tagging systems［J］．Information Sciences，2023，640：119064/1-14.
[17]	ZHOU K， WANG H， ZHAO W X，et al ．S3-Rec：self-supervised learning for sequential recommendation with mutual information maximization［C］∥ Proceedings of the 29th ACM International Conference on Information & Knowledge Management．New York：ACM，2020： 1893-1902.
[18]	WU J C， WANG X， FENG F L，et al ．Self-supervised graph learning for recommendation［C］∥ Proceedings of the 44th international ACM SIGIR Conference on Research and Development in Information Retrieval．New York：ACM，2021：726-735.
[19]	XIE X， SUN F， LIU Z Y，et al ．Contrastive learning for sequential recommendation［C］∥ Proceedings of 2022 IEEE the 38th International Conference on Data Engineering．Kuala Lumpur：IEEE，2022：1259-1273.
[20]	PHAM P， NGUYEN L T T， NGUYEN N T，et al ．A hierarchical fused fuzzy deep neural network with heterogeneous network embedding for recommendation ［J］．Information Sciences，2023，620：105-124.
[21]	SHI C， LI Y T， ZHANG J W，et al ．A survey of heterogeneous information network analysis［J］．IEEE Transactions on Knowledge and Data Engineering，2016，29（1）：17-37.
[22]	FENG W， WANG J Y ．Incorporating heterogeneous information for personalized tag recommendation in social tagging systems［C］∥ Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining．Beijing：ACM，2012：1276-1284.
[23]	蔡晓东，曾志杨．AFGSRec：一种自适应融合全局协同特征的社交推荐模型［J］．华南理工大学学报（自然科学版），2022，50（12）：71-79.
	CAI Xiaodong， ZENG Zhiyang ．AFGSRec：a social recommendation model based on adaptive fusion of global collaborative features［J］．Journal of South China University of Technology （Natural Science Edition），2022，50（12）：71-79.
[24]	SHI C， ZHANG Z Q， LUO P，et al ．Semantic path based personalized recommendation on weighted heterogeneous information networks［C］∥ Proceedings of the 24th ACM International on Conference on Information and Knowledge Management．Melbourne：ACM，2015：453-462.
[25]	FRANÇOIS-LAVET V， HENDERSON P， ISLAM R，et al ．An introduction to deep reinforcement learning［J］．Foundations and Trends in Machine Learning，2018，11（3/4）：219-354.
[26]	HE X N， LIAO L Z， ZHANG H W，et al ．Neural collaborative filtering［C］∥ Proceedings of the 26th International Conference on World Wide Web．Perth：ACM，2017：173-182.
[27]	RENDLE S， FREUDENTHALER C， GANTNER Z，et al ．BPR：Bayesian personalized ranking from implicit feedback［C］∥ Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence．Montreal：AUAI Press，2009：452-461.
[28]	KINGMA D P， BA J ．Adam：a method for stochastic optimization［EB/OL］．（2017-01-30）［2024-03-01］．.

数据集	用户数	项目数	交互数	稀疏度/%
MovieLens 1M	6 040	3 706	1 000 209	4.47
EachMovie	61 265	1 623	2 811 718	2.83
Amazon	6 170	2 753	195 791	1.15

模型	MovieLens 1M上的累积准确率			EachMovie上的累积准确率			Amazon上的累积准确率
模型	T = 5	T = 10	T = 20	T = 5	T = 10	T = 20	T = 5	T = 10	T = 20
Random	0.215 0	0.440 0	0.898 3	0.060 0	0.118 0	0.250 0	0.040 0	0.080 0	0.140 0
Pop	0.620 5	1.024 0	1.680 0	1.060 5	2.272 9	4.416 4	0.194 0	0.393 9	0.755 0
BPR	1.044 2	2.202 4	4.488 8	0.985 1	2.118 8	4.009 9	0.177 3	0.345 5	0.657 6
ϵ-Greedy	2.009 9	3.818 2	7.034 7	1.604 4	2.755 5	4.473 8	0.264 2	0.476 5	0.828 2
DQNR	2.005 0	3.856 2	7.059 5	1.561 8	2.840 0	4.326 6	0.260 9	0.452 2	0.910 9
NICF	2.054 5	3.760 3	6.963 6	1.570 0	2.734 9	4.590 0	0.234 0	0.299 0	0.329 0
SGL	1.661 2	3.119 0	4.968 6	1.336 2	2.176 4	2.668 5	0.179 9	0.236 6	0.337 1
GreedyRM	2.041 3	3.876 0	7.132 2	1.612 0	2.826 8	4.718 1	0.319 3	0.494 3	0.852 5
MOCIR	2.091 1	3.900 7	7.203 6	1.629 5	2.867 0	4.801 9	0.332 3	0.517 0	0.944 9

模型	MovieLens 1M上的累积召回率			EachMovie上的累积召回率			Amazon上的累积召回率
模型	T = 5	T = 10	T = 20	T = 5	T = 10	T = 20	T = 5	T = 10	T = 20
Random	0.001 1	0.002 7	0.005 1	0.002 6	0.005 5	0.013 0	0.000 8	0.002 8	0.005 6
Pop	0.026 5	0.046 1	0.082 2	0.109 4	0.207 8	0.226 5	0.006 2	0.018 0	0.035 7
BPR	0.009 6	0.018 5	0.036 6	0.066 6	0.101 4	0.153 7	0.006 8	0.011 8	0.020 3
ϵ-Greedy	0.033 2	0.061 3	0.102 8	0.129 2	0.194 5	0.335 5	0.011 2	0.021 1	0.037 9
DQNR	0.032 6	0.059 2	0.106 6	0.129 5	0.209 3	0.313 9	0.011 0	0.021 6	0.036 4
NICF	0.020 9	0.038 3	0.067 8	0.080 1	0.126 0	0.182 0	0.009 0	0.016 2	0.024 6
SGL	0.022 4	0.041 3	0.064 5	0.120 4	0.164 3	0.195 7	0.007 7	0.009 8	0.013 7
GreedyRM	0.034 3	0.061 5	0.107 6	0.138 9	0.217 2	0.348 1	0.013 0	0.021 2	0.036 3
MOCIR	0.035 2	0.061 8	0.110 0	0.146 4	0.227 6	0.354 4	0.014 0	0.021 7	0.039 3

基于多兴趣对比的深度强化学习推荐模型

Deep Reinforcement Learning Recommendation Model Based on Multi-Interest Contrast

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 10

参考文献 28

相关文章 15

编辑推荐

Metrics

本文评价

[1]	陆璐, 万童. 一种基于路径表示和预训练模型的软件漏洞检测方法[J]. 华南理工大学学报(自然科学版), 2025, 53(5): 56-65.
[2]	蔡晓东, 董丽芳, 黄业洋, 周丽. 基于文本-视觉和信息熵最小化的对比学习模型[J]. 华南理工大学学报(自然科学版), 2025, 53(3): 50-56.
[3]	程小华, 王泽夫, 曾君, 等. 基于EA-RL算法的分布式能源集群调度方法[J]. 华南理工大学学报(自然科学版), 2025, 53(1): 1-9.
[4]	周璇, 莫浩华, 闫军威. 基于改进H-AC算法的冷源系统节能优化控制策略[J]. 华南理工大学学报(自然科学版), 2025, 53(1): 21-31.
[5]	罗玉涛, 薛志成. 面向自动驾驶的多任务辅助驾驶策略学习方法[J]. 华南理工大学学报(自然科学版), 2024, 52(10): 31-40.
[6]	王福建, 程慧玲, 马东方, 等. 基于深度逆向强化学习的城市车辆路径链重构[J]. 华南理工大学学报(自然科学版), 2023, 51(7): 120-128.
[7]	叶峰, 陈彪, 赖乙宗. 基于特征空间嵌入的对比知识蒸馏算法[J]. 华南理工大学学报(自然科学版), 2023, 51(5): 13-23.
[8]	陈锋, 毛豪滨, 蔡吉玲, 等. 面向低延时实时视频的多维跨层带宽预测[J]. 华南理工大学学报(自然科学版), 2023, 51(11): 18-27.
[9]	许伦辉, 余佳芯, 裴明阳, 等. 基于几何路网结构和强化学习的车辆重定位策略[J]. 华南理工大学学报(自然科学版), 2023, 51(10): 99-109.
[10]	王高, 陈晓鸿, 柳宁, 等. 一种基于视角选择经验增强算法的机器人抓取策略[J]. 华南理工大学学报(自然科学版), 2022, 50(9): 126-137.
[11]	闫军威黄琪周璇. 基于Double-DQN的中央空调系统节能优化运行[J]. 华南理工大学学报（自然科学版）, 2019, 47(1): 135-144.
[12]	邓卓明刘明波. 求解多目标暂态电压紧急控制的强化学习方法[J]. 华南理工大学学报（自然科学版）, 2015, 43(12): 9-17.
[13]	徐玉滨陈佳美马琳. 基于Q学习的WLAN/WIMAX接入控制网络选择策略[J]. 华南理工大学学报（自然科学版）, 2013, 41(8): 41-46,60.
[14]	郝钏钏方舟李平. 采用经验复用的高效强化学习控制方法[J]. 华南理工大学学报(自然科学版), 2012, 40(6): 70-75.
[15]	余涛胡细兵刘靖. 基于多步回溯Q（λ）学习算法的多目标最优潮流计算[J]. 华南理工大学学报（自然科学版）, 2010, 38(10): 139-145.