Computer Science & Technology

Enhancing Multimodal Product Summarization through Claim-Based Evaluation and Preference Optimization

Expand
  • 1. Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen 518055, Guangdong, China;
    2. School of Computer Science, Peking University, Beijing 100871, China;
    3. School of Computer Science and Technology, Shandong University, Qingdao 266237, Shandong, China

Online published: 2026-01-23

Abstract

The task of multimodal product summarization aims to generate concise and accurate summaries that effectively highlight key selling points based on textual and visual product information. However, existing approaches face two major challenges: first, traditional overlap-based metrics such as ROUGE struggle to reliably assess how well a summary captures essential product information; second, mainstream supervised fine-tuning paradigms fail to model users’ implicit preferences regarding the prominence of key elements, resulting in summaries that deviate from actual needs. To address these issues, this paper proposes a claim-based summarization evaluation metric (CSE), which evaluates the expression of key information from two dimensions: claim hit accuracy (CHA) and claim quantity ratio (CQR). Furthermore, we introduce PAMPS, a preference-aligned multimodal product summarization model that incorporates four stages—supervised fine-tuning, summary resampling, CSE-driven preference pair construction, and direct preference optimization—to progressively align the model with user preferences regarding key product elements. Experiments on the large-scale Chinese e-commerce dataset CEPSUM demonstrate the effectiveness of the proposed method. PAMPS achieves notable improvements in ROUGE metrics, where DPO-ROUGE improves ROUGE-1/2/L by 0.25, 0.44, and 1.21 on average compared with SFT, indicating enhanced overall generation quality. Under the CSE evaluation framework, DPO-CSE yields the most significant gains in claim hit accuracy, with an average improvement exceeding 4%, highlighting the capability of element-oriented preference optimization to strengthen the model’s ability to capture and express core product information. Overall, the results validate the effectiveness and practical value of the proposed approach in improving multimodal product summarization quality.

Cite this article

SONG Xuemeng, LI Zhimo, HOU Bohan, et al . Enhancing Multimodal Product Summarization through Claim-Based Evaluation and Preference Optimization[J]. Journal of South China University of Technology(Natural Science), 0 : 1 . DOI: 10.12141/j.issn.1000-565X.250375

Options
Outlines

/