华南理工大学学报(自然科学版) ›› 2025, Vol. 53 ›› Issue (9): 11-21.doi: 10.12141/j.issn.1000-565X.240088

• 计算机科学与技术 • 上一篇    下一篇

基于多兴趣对比的深度强化学习推荐方法

刘慧婷1,2 刘绍雄1 王佳乐1,3 赵鹏1   

  1. 1.安徽大学 计算机科学与技术学院,安徽 合肥 230601;

    2.合肥综合性国家科学中心人工智能研究院,安徽 合肥 230088;

    3.安徽大学 石溪学院,安徽 合肥 230039

  • 出版日期:2025-09-25 发布日期:2025-04-27

Multi-Interest Oriented Contrastive for Deep Reinforcement Learning-based Recommendation

LIU Huiting1,2LIU Shaoxiong1  WANG Jiale1,3ZHAO Peng1   

  1. 1. School of Computer Science and Technology, Anhui University, Hefei 230601, Anhui, China;

    2. Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei 230088, Anhui, China;

    3. Stony Brook Institute, Anhui University, Hefei 230039, Anhui, China

  • Online:2025-09-25 Published:2025-04-27

摘要:

深度强化学习(DRL)算法被广泛应用到推荐系统中,用于动态建模用户兴趣并最大化用户的累积收益。然而,用户反馈稀疏问题成为基于DRL的推荐方法面临的一个重要挑战。对比学习作为一种自监督学习方法可以构造用户兴趣的多个视角,增强用户兴趣的表示的同时缓解用户反馈数据稀疏的问题。现有的对比学习方法通常利用基于启发式的增强策略,导致关键信息的丢失,且未充分利用异构的交互信息。为了解决这些问题,本文提出了基于多兴趣对比的深度强化学习推荐方法(MOCIR)。具体而言,本文所提出的模型包括一个对比表示模块和一个策略网络模块。对比表示模块利用异构信息网络(HINs)来建模用户不同方面的局部兴趣,并使用原始数据来建模用户的全局兴趣,然后利用全局和局部兴趣作为对比学习的一对正负样本对,从而有效地捕捉用户的兴趣。策略网络用于在聚合用户状态表示后进行推荐,策略网络和对比模块交替的更新。通过在三个基准数据集上进行实验证明,本文提出的方法相较于最先进的方法在提高推荐模型性能方面取得了显著的改进。

关键词: 多兴趣, 强化学习, 对比学习, 异质信息网络

Abstract:

Deep reinforcement learning (DRL) algorithms have been incorporated into recommendation settings for dynamically modeling the interests of users and maximizing cumulative rewards. However, data sparsity poses a challenge to most DRL-based interactive recommendation methods. A good view is that contrastive learning can address the data sparsity problem. However, most existing contrastive learning methods typically exploit heuristic-based augmentation strategies, resulting in the loss of critical information and failure to make full use of heterogeneous information. To address these shortcomings, we propose multi-interest oriented contrastive for deep reinforcement learning-based recommendations (MOCIR). In particular, we utilize heterogeneous information networks (HINs) to model different aspects of a user’s local interests and utilize original data to model the user’s global interests. The proposed method comprises a contrastive learning module and policy network. The contrastive learning module uses metapaths in the HINs to find neighbors for items with different aspects, aggregates them to obtain item representations, and then utilizes both the global and local interests as positive pairs for contrastive learning, thereby effectively capturing the user’s interest. A policy network is used to make recommendations after the user state representation is aggregated, and the contrastive learning module and the policy network are jointly updated. The proposed method is superior to its state-of-the-art counterparts, as demonstrated by experiments on three benchmark datasets.

Key words: multi-interest, reinforcement learning, contrastive learning, heterogeneous information network