基于多兴趣对比的深度强化学习推荐模型

刘慧婷; 刘绍雄; 王佳乐; 赵鹏

doi:10.12141/j.issn.1000-565X.240088

华南理工大学学报(自然科学版) >

2025 , Vol. 53 >Issue 9: 11 - 21

DOI: https://doi.org/10.12141/j.issn.1000-565X.240088

计算机科学与技术

基于多兴趣对比的深度强化学习推荐模型

刘慧婷 ,
刘绍雄 ,
王佳乐 ,
赵鹏

展开

^1.安徽大学计算机科学与技术学院，安徽合肥 230601
^2.合肥综合性国家科学中心人工智能研究院，安徽合肥 230088
^3.安徽大学纽约石溪学院，安徽合肥 230039

刘慧婷（1978—），女，博士，副教授，主要从事自然语言处理和个性化推荐研究。E-mail： htliu@ahu.edu.cn

收稿日期: 2024-02-27

网络出版日期: 2025-04-27

基金资助

国家自然科学基金项目(62576003);安徽省高校协同创新项目(GXXT-2022-040);安徽省自然科学基金项目(2008085MF219);安徽省自然科学基金项目(2108085MF212);安徽省高校自然科学研究项目(KJ2021-A0040);安徽省高校自然科学研究项目(KJ2021-A0043)

收起

Deep Reinforcement Learning Recommendation Model Based on Multi-Interest Contrast

LIU Huiting ,
LIU Shaoxiong ,
WANG Jiale ,
ZHAO Peng

Expand

^1.School of Computer Science and Technology，Anhui University，Hefei 230601，Anhui，China
^2.Institute of Artificial Intelligence，Hefei Comprehensive National Science Center，Hefei 230088，Anhui，China
^3.Stony Brook Institute，Anhui University，Hefei 230039，Anhui，China

刘慧婷（1978—），女，博士，副教授，主要从事自然语言处理和个性化推荐研究。E-mail： htliu@ahu.edu.cn

Received date: 2024-02-27

Online published: 2025-04-27

Supported by

the National Natural Science Foundation of China(62576003);the University Synergy Innovation Program of Anhui Province(GXXT-2022-040);the Natural Science Foundation of Anhui Province(2008085MF219);the Provincial Natural Science Foundation of Anhui Higher Education Institution of China(KJ2021-A0040)

Fold

摘要

深度强化学习（DRL）被广泛应用于推荐系统中，用于动态建模用户兴趣并最大化用户的累积收益。然而，用户反馈稀疏问题成为基于DRL的推荐算法面临的重要挑战之一。对比学习作为一种自监督学习方法，通过构造用户兴趣的多个视角增强其表示，进而缓解用户反馈稀疏问题。现有的对比学习方法通常利用基于启发式的增强策略，导致关键信息丢失，且未充分利用异构的交互信息。为解决这些问题，该文提出了基于多兴趣对比的深度强化学习推荐模型（MOCIR）。该模型包括一个对比表示模块和一个策略网络模块。对比表示模块利用异构信息网络（HIN）建模用户不同方面的局部兴趣，同时基于原始数据建模用户的全局兴趣，然后将同一用户的全局兴趣与局部兴趣、不同用户的全局兴趣与局部兴趣分别作为对比学习的正样本对和负样本对，以有效捕捉用户兴趣；策略网络模块用于在聚合用户状态表示后进行推荐；2个模块采用交替更新机制。在3个数据集上的实验结果表明，所提模型的推荐性能优于多个基于深度强化学习的模型，有效地解决了推荐中用户反馈稀疏问题。

关键词： 多兴趣; 强化学习; 对比学习; 异质信息网络

本文引用格式

刘慧婷 , 刘绍雄 , 王佳乐 , 赵鹏 . 基于多兴趣对比的深度强化学习推荐模型[J]. 华南理工大学学报(自然科学版), 2025 , 53(9) : 11 -21 . DOI: 10.12141/j.issn.1000-565X.240088

Abstract

Deep Reinforcement Learning (DRL) is widely applied in recommender systems to dynamically model user interests and maximize cumulative user benefits. However, the sparsity of user feedback has become a significant challenge for DRL-based recommendation algorithms. Contrastive learning, as a self-supervised learning method, enhances user interest representation by constructing multiple perspectives, thereby alleviating the issue of sparse user feedback. Existing contrastive learning methods typically rely on heuristic-based augmentation strategies, which often lead to the loss of key information and fail to fully utilize heterogeneous interaction data. To address these issues, this paper proposed a multi-interest oriented contrastive deep reinforcement learning recommendation (MOCIR) model. The model consists of two key modules: a contrastive representation module and a policy network module. The contrastive representation module utilizes a Heterogeneous Information Network (HIN) to model the user’s local interests from different aspects while capturing their global interests based on raw interaction data. It then treats the global and local interests of the same user as positive pairs and those of different users as negative pairs for contrastive learning, effectively enhancing user interest representation. The policy network module aggregates user state representations and generates recommendations. The two modules are trained using an alternating update mechanism. Experimental results on three benchmark datasets show that the proposed model outperforms several DRL-based models in recommendation performance, effectively addressing the problem of sparse user feedback in recommendations.

Key words： multi-interest; reinforcement learning; contrastive learning; heterogeneous information network

参考文献

[1]	马晓亮，高洁，刘英，等．基于意图理解驱动的客服知识推荐大模型构建［J］．华南理工大学学报（自然科学版），2025，53（3）：40-49.
	MA Xiaoliang， GAO Jie， LIU Ying，et al ．Customer Service knowledge recommendation large model construction driven by intent understanding［J］．Journal of South China University of Technology （Natural Science Edition），2025，53（3）：40-49.
[2]	LEE H， HWANG D， MIN K，et al ．Towards validating long-term user feedbacks in interactive recommendation systems［C］∥ Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval．Madrid：ACM，2022：2607-2611.
[3]	LI L H， CHU W， LANGFORD J，et al ．A contextual-bandit approach to personalized news article recommendation［C］∥ Proceedings of the 19th International Conference on World Wide Web．Raleigh：ACM，2010：661-670.
[4]	CHAPELLE O， LI L H ．An empirical evaluation of thompson sampling［J］．Advances in Neural Information Processing Systems，2011：2249-2257.
[5]	ZHAO X X， ZHANG W N， WANG J ．Interactive collaborative filtering［C］∥ Proceedings of the 22nd ACM International Conference on Information & Knowledge Management．San Francisco：ACM，2013：1411-1420.
[6]	CHEN H K， DAI X Y， CAI H，et al ．Large-scale interactive recommendation with tree-structured policy gradient［C］∥ Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence．Honolulu：AAAI，2019：3312-3320.
[7]	CHEN M M， BEUTEL A， COVINGTON P，et al ．Top-k off-policy correction for a REINFORCE recommender system［C］∥ Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining．Melbourne：ACM，2019：456-464.
[8]	LIU F， TANG R M， LI X T，et al ．Deep reinforcement learning based recommendation with explicit user-item interactions modeling［EB/OL］．（2019-10-29）［2024-03-01］．.
[9]	ZOU L X， XIA L， GU Y L，et al ．Neural interactive collaborative filtering［C］∥ Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval．New York：ACM，2020：749-758.
[10]	ZHAO X Y， ZHANG L， DING Z Y，et al ．Recommendations with negative feedback via pairwise deep reinforcement learning［C］∥ Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.London：ACM，2018：1040-1048.
[11]	ZHOU S J， DAI X Y， CHEN H N，et al ．Interactive recommender system via knowledge graph-enhanced reinforcement learning［C］∥ Proceedings of the 43rd International ACM SIGIR Conference on Research and Development In Information Retrieval．New York：ACM，2020：179-188.
[12]	CHEN T， KORNBLITH S， NOROUZI M，et al ．A simple framework for contrastive learning of visual representations［C］∥ Proceedings of the 37th International Conference on Machine Learning．［S.l.］：ML Research Press，2020：1597-1607.
[13]	HE K M， FAN H Q， WU Y X，et al ．Momentum contrast for unsupervised visual representation learning ［C］∥ Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition．Seattle：IEEE，2020：9729-9738.
[14]	CHENG C， YANG H， LYU M R，et al ．Where you like to go next：successive point-of-interest recommendation［C］∥ Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence．Beijing：AAAI，2013：2605-2611.
[15]	JI J C， ZHANG B J， YU J C，et al ．Relationship-aware contrastive learning for social recommendations ［J］．Information Sciences，2023，629：778-797.
[16]	XU C， ZHANG Y， CHEN H Y，et al ．A fairness-aware graph contrastive learning recommender framework for social tagging systems［J］．Information Sciences，2023，640：119064/1-14.
[17]	ZHOU K， WANG H， ZHAO W X，et al ．S3-Rec：self-supervised learning for sequential recommendation with mutual information maximization［C］∥ Proceedings of the 29th ACM International Conference on Information & Knowledge Management．New York：ACM，2020： 1893-1902.
[18]	WU J C， WANG X， FENG F L，et al ．Self-supervised graph learning for recommendation［C］∥ Proceedings of the 44th international ACM SIGIR Conference on Research and Development in Information Retrieval．New York：ACM，2021：726-735.
[19]	XIE X， SUN F， LIU Z Y，et al ．Contrastive learning for sequential recommendation［C］∥ Proceedings of 2022 IEEE the 38th International Conference on Data Engineering．Kuala Lumpur：IEEE，2022：1259-1273.
[20]	PHAM P， NGUYEN L T T， NGUYEN N T，et al ．A hierarchical fused fuzzy deep neural network with heterogeneous network embedding for recommendation ［J］．Information Sciences，2023，620：105-124.
[21]	SHI C， LI Y T， ZHANG J W，et al ．A survey of heterogeneous information network analysis［J］．IEEE Transactions on Knowledge and Data Engineering，2016，29（1）：17-37.
[22]	FENG W， WANG J Y ．Incorporating heterogeneous information for personalized tag recommendation in social tagging systems［C］∥ Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining．Beijing：ACM，2012：1276-1284.
[23]	蔡晓东，曾志杨．AFGSRec：一种自适应融合全局协同特征的社交推荐模型［J］．华南理工大学学报（自然科学版），2022，50（12）：71-79.
	CAI Xiaodong， ZENG Zhiyang ．AFGSRec：a social recommendation model based on adaptive fusion of global collaborative features［J］．Journal of South China University of Technology （Natural Science Edition），2022，50（12）：71-79.
[24]	SHI C， ZHANG Z Q， LUO P，et al ．Semantic path based personalized recommendation on weighted heterogeneous information networks［C］∥ Proceedings of the 24th ACM International on Conference on Information and Knowledge Management．Melbourne：ACM，2015：453-462.
[25]	FRAN?OIS-LAVET V， HENDERSON P， ISLAM R，et al ．An introduction to deep reinforcement learning［J］．Foundations and Trends in Machine Learning，2018，11（3/4）：219-354.
[26]	HE X N， LIAO L Z， ZHANG H W，et al ．Neural collaborative filtering［C］∥ Proceedings of the 26th International Conference on World Wide Web．Perth：ACM，2017：173-182.
[27]	RENDLE S， FREUDENTHALER C， GANTNER Z，et al ．BPR：Bayesian personalized ranking from implicit feedback［C］∥ Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence．Montreal：AUAI Press，2009：452-461.
[28]	KINGMA D P， BA J ．Adam：a method for stochastic optimization［EB/OL］．（2017-01-30）［2024-03-01］．.

Options

文章导航

模态框（Modal）标题

摘要

本文引用格式

Abstract

参考文献