Journal of South China University of Technology(Natural Science Edition) ›› 2025, Vol. 53 ›› Issue (9): 11-21.doi: 10.12141/j.issn.1000-565X.240088

• Computer Science & Technology • Previous Articles     Next Articles

Multi-Interest Oriented Contrastive for Deep Reinforcement Learning-based Recommendation

LIU Huiting1,2LIU Shaoxiong1  WANG Jiale1,3ZHAO Peng1   

  1. 1. School of Computer Science and Technology, Anhui University, Hefei 230601, Anhui, China;

    2. Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei 230088, Anhui, China;

    3. Stony Brook Institute, Anhui University, Hefei 230039, Anhui, China

  • Online:2025-09-25 Published:2025-04-27

Abstract:

Deep reinforcement learning (DRL) algorithms have been incorporated into recommendation settings for dynamically modeling the interests of users and maximizing cumulative rewards. However, data sparsity poses a challenge to most DRL-based interactive recommendation methods. A good view is that contrastive learning can address the data sparsity problem. However, most existing contrastive learning methods typically exploit heuristic-based augmentation strategies, resulting in the loss of critical information and failure to make full use of heterogeneous information. To address these shortcomings, we propose multi-interest oriented contrastive for deep reinforcement learning-based recommendations (MOCIR). In particular, we utilize heterogeneous information networks (HINs) to model different aspects of a user’s local interests and utilize original data to model the user’s global interests. The proposed method comprises a contrastive learning module and policy network. The contrastive learning module uses metapaths in the HINs to find neighbors for items with different aspects, aggregates them to obtain item representations, and then utilizes both the global and local interests as positive pairs for contrastive learning, thereby effectively capturing the user’s interest. A policy network is used to make recommendations after the user state representation is aggregated, and the contrastive learning module and the policy network are jointly updated. The proposed method is superior to its state-of-the-art counterparts, as demonstrated by experiments on three benchmark datasets.

Key words: multi-interest, reinforcement learning, contrastive learning, heterogeneous information network