Deep Reinforcement Learning Recommendation Model Based on Multi-Interest Contrast

doi:10.12141/j.issn.1000-565X.240088

Abstract

Abstract:

Deep Reinforcement Learning (DRL) is widely applied in recommender systems to dynamically model user interests and maximize cumulative user benefits. However, the sparsity of user feedback has become a significant challenge for DRL-based recommendation algorithms. Contrastive learning, as a self-supervised learning method, enhances user interest representation by constructing multiple perspectives, thereby alleviating the issue of sparse user feedback. Existing contrastive learning methods typically rely on heuristic-based augmentation strategies, which often lead to the loss of key information and fail to fully utilize heterogeneous interaction data. To address these issues, this paper proposed a multi-interest oriented contrastive deep reinforcement learning recommendation (MOCIR) model. The model consists of two key modules: a contrastive representation module and a policy network module. The contrastive representation module utilizes a Heterogeneous Information Network (HIN) to model the user’s local interests from different aspects while capturing their global interests based on raw interaction data. It then treats the global and local interests of the same user as positive pairs and those of different users as negative pairs for contrastive learning, effectively enhancing user interest representation. The policy network module aggregates user state representations and generates recommendations. The two modules are trained using an alternating update mechanism. Experimental results on three benchmark datasets show that the proposed model outperforms several DRL-based models in recommendation performance, effectively addressing the problem of sparse user feedback in recommendations.

Key words: multi-interest, reinforcement learning, contrastive learning, heterogeneous information network

CLC Number:

TP391

LIU Huiting, LIU Shaoxiong, WANG Jiale, ZHAO Peng. Deep Reinforcement Learning Recommendation Model Based on Multi-Interest Contrast[J]. Journal of South China University of Technology(Natural Science Edition), 2025, 53(9): 11-21.

Figures/Tables 10

Fig.1

Fig.2

Table 1

Table 2

Table 3

Fig.3

Fig.4

Fig.5

Fig.6

Fig.7

References 28

[1]	马晓亮，高洁，刘英，等．基于意图理解驱动的客服知识推荐大模型构建［J］．华南理工大学学报（自然科学版），2025，53（3）：40-49.
	MA Xiaoliang， GAO Jie， LIU Ying，et al ．Customer Service knowledge recommendation large model construction driven by intent understanding［J］．Journal of South China University of Technology （Natural Science Edition），2025，53（3）：40-49.
[2]	LEE H， HWANG D， MIN K，et al ．Towards validating long-term user feedbacks in interactive recommendation systems［C］∥ Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval．Madrid：ACM，2022：2607-2611.
[3]	LI L H， CHU W， LANGFORD J，et al ．A contextual-bandit approach to personalized news article recommendation［C］∥ Proceedings of the 19th International Conference on World Wide Web．Raleigh：ACM，2010：661-670.
[4]	CHAPELLE O， LI L H ．An empirical evaluation of thompson sampling［J］．Advances in Neural Information Processing Systems，2011：2249-2257.
[5]	ZHAO X X， ZHANG W N， WANG J ．Interactive collaborative filtering［C］∥ Proceedings of the 22nd ACM International Conference on Information & Knowledge Management．San Francisco：ACM，2013：1411-1420.
[6]	CHEN H K， DAI X Y， CAI H，et al ．Large-scale interactive recommendation with tree-structured policy gradient［C］∥ Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence．Honolulu：AAAI，2019：3312-3320.
[7]	CHEN M M， BEUTEL A， COVINGTON P，et al ．Top-k off-policy correction for a REINFORCE recommender system［C］∥ Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining．Melbourne：ACM，2019：456-464.
[8]	LIU F， TANG R M， LI X T，et al ．Deep reinforcement learning based recommendation with explicit user-item interactions modeling［EB/OL］．（2019-10-29）［2024-03-01］．.
[9]	ZOU L X， XIA L， GU Y L，et al ．Neural interactive collaborative filtering［C］∥ Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval．New York：ACM，2020：749-758.
[10]	ZHAO X Y， ZHANG L， DING Z Y，et al ．Recommendations with negative feedback via pairwise deep reinforcement learning［C］∥ Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.London：ACM，2018：1040-1048.
[11]	ZHOU S J， DAI X Y， CHEN H N，et al ．Interactive recommender system via knowledge graph-enhanced reinforcement learning［C］∥ Proceedings of the 43rd International ACM SIGIR Conference on Research and Development In Information Retrieval．New York：ACM，2020：179-188.
[12]	CHEN T， KORNBLITH S， NOROUZI M，et al ．A simple framework for contrastive learning of visual representations［C］∥ Proceedings of the 37th International Conference on Machine Learning．［S.l.］：ML Research Press，2020：1597-1607.
[13]	HE K M， FAN H Q， WU Y X，et al ．Momentum contrast for unsupervised visual representation learning ［C］∥ Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition．Seattle：IEEE，2020：9729-9738.
[14]	CHENG C， YANG H， LYU M R，et al ．Where you like to go next：successive point-of-interest recommendation［C］∥ Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence．Beijing：AAAI，2013：2605-2611.
[15]	JI J C， ZHANG B J， YU J C，et al ．Relationship-aware contrastive learning for social recommendations ［J］．Information Sciences，2023，629：778-797.
[16]	XU C， ZHANG Y， CHEN H Y，et al ．A fairness-aware graph contrastive learning recommender framework for social tagging systems［J］．Information Sciences，2023，640：119064/1-14.
[17]	ZHOU K， WANG H， ZHAO W X，et al ．S3-Rec：self-supervised learning for sequential recommendation with mutual information maximization［C］∥ Proceedings of the 29th ACM International Conference on Information & Knowledge Management．New York：ACM，2020： 1893-1902.
[18]	WU J C， WANG X， FENG F L，et al ．Self-supervised graph learning for recommendation［C］∥ Proceedings of the 44th international ACM SIGIR Conference on Research and Development in Information Retrieval．New York：ACM，2021：726-735.
[19]	XIE X， SUN F， LIU Z Y，et al ．Contrastive learning for sequential recommendation［C］∥ Proceedings of 2022 IEEE the 38th International Conference on Data Engineering．Kuala Lumpur：IEEE，2022：1259-1273.
[20]	PHAM P， NGUYEN L T T， NGUYEN N T，et al ．A hierarchical fused fuzzy deep neural network with heterogeneous network embedding for recommendation ［J］．Information Sciences，2023，620：105-124.
[21]	SHI C， LI Y T， ZHANG J W，et al ．A survey of heterogeneous information network analysis［J］．IEEE Transactions on Knowledge and Data Engineering，2016，29（1）：17-37.
[22]	FENG W， WANG J Y ．Incorporating heterogeneous information for personalized tag recommendation in social tagging systems［C］∥ Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining．Beijing：ACM，2012：1276-1284.
[23]	蔡晓东，曾志杨．AFGSRec：一种自适应融合全局协同特征的社交推荐模型［J］．华南理工大学学报（自然科学版），2022，50（12）：71-79.
	CAI Xiaodong， ZENG Zhiyang ．AFGSRec：a social recommendation model based on adaptive fusion of global collaborative features［J］．Journal of South China University of Technology （Natural Science Edition），2022，50（12）：71-79.
[24]	SHI C， ZHANG Z Q， LUO P，et al ．Semantic path based personalized recommendation on weighted heterogeneous information networks［C］∥ Proceedings of the 24th ACM International on Conference on Information and Knowledge Management．Melbourne：ACM，2015：453-462.
[25]	FRANÇOIS-LAVET V， HENDERSON P， ISLAM R，et al ．An introduction to deep reinforcement learning［J］．Foundations and Trends in Machine Learning，2018，11（3/4）：219-354.
[26]	HE X N， LIAO L Z， ZHANG H W，et al ．Neural collaborative filtering［C］∥ Proceedings of the 26th International Conference on World Wide Web．Perth：ACM，2017：173-182.
[27]	RENDLE S， FREUDENTHALER C， GANTNER Z，et al ．BPR：Bayesian personalized ranking from implicit feedback［C］∥ Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence．Montreal：AUAI Press，2009：452-461.
[28]	KINGMA D P， BA J ．Adam：a method for stochastic optimization［EB/OL］．（2017-01-30）［2024-03-01］．.

数据集	用户数	项目数	交互数	稀疏度/%
MovieLens 1M	6 040	3 706	1 000 209	4.47
EachMovie	61 265	1 623	2 811 718	2.83
Amazon	6 170	2 753	195 791	1.15

模型	MovieLens 1M上的累积准确率			EachMovie上的累积准确率			Amazon上的累积准确率
模型	T = 5	T = 10	T = 20	T = 5	T = 10	T = 20	T = 5	T = 10	T = 20
Random	0.215 0	0.440 0	0.898 3	0.060 0	0.118 0	0.250 0	0.040 0	0.080 0	0.140 0
Pop	0.620 5	1.024 0	1.680 0	1.060 5	2.272 9	4.416 4	0.194 0	0.393 9	0.755 0
BPR	1.044 2	2.202 4	4.488 8	0.985 1	2.118 8	4.009 9	0.177 3	0.345 5	0.657 6
ϵ-Greedy	2.009 9	3.818 2	7.034 7	1.604 4	2.755 5	4.473 8	0.264 2	0.476 5	0.828 2
DQNR	2.005 0	3.856 2	7.059 5	1.561 8	2.840 0	4.326 6	0.260 9	0.452 2	0.910 9
NICF	2.054 5	3.760 3	6.963 6	1.570 0	2.734 9	4.590 0	0.234 0	0.299 0	0.329 0
SGL	1.661 2	3.119 0	4.968 6	1.336 2	2.176 4	2.668 5	0.179 9	0.236 6	0.337 1
GreedyRM	2.041 3	3.876 0	7.132 2	1.612 0	2.826 8	4.718 1	0.319 3	0.494 3	0.852 5
MOCIR	2.091 1	3.900 7	7.203 6	1.629 5	2.867 0	4.801 9	0.332 3	0.517 0	0.944 9

模型	MovieLens 1M上的累积召回率			EachMovie上的累积召回率			Amazon上的累积召回率
模型	T = 5	T = 10	T = 20	T = 5	T = 10	T = 20	T = 5	T = 10	T = 20
Random	0.001 1	0.002 7	0.005 1	0.002 6	0.005 5	0.013 0	0.000 8	0.002 8	0.005 6
Pop	0.026 5	0.046 1	0.082 2	0.109 4	0.207 8	0.226 5	0.006 2	0.018 0	0.035 7
BPR	0.009 6	0.018 5	0.036 6	0.066 6	0.101 4	0.153 7	0.006 8	0.011 8	0.020 3
ϵ-Greedy	0.033 2	0.061 3	0.102 8	0.129 2	0.194 5	0.335 5	0.011 2	0.021 1	0.037 9
DQNR	0.032 6	0.059 2	0.106 6	0.129 5	0.209 3	0.313 9	0.011 0	0.021 6	0.036 4
NICF	0.020 9	0.038 3	0.067 8	0.080 1	0.126 0	0.182 0	0.009 0	0.016 2	0.024 6
SGL	0.022 4	0.041 3	0.064 5	0.120 4	0.164 3	0.195 7	0.007 7	0.009 8	0.013 7
GreedyRM	0.034 3	0.061 5	0.107 6	0.138 9	0.217 2	0.348 1	0.013 0	0.021 2	0.036 3
MOCIR	0.035 2	0.061 8	0.110 0	0.146 4	0.227 6	0.354 4	0.014 0	0.021 7	0.039 3