Enhancing Multimodal Product Summarization through Claim-Based Evaluation and Preference Optimization
Online published: 2026-01-23
The task of multimodal product summarization aims to generate concise and accurate summaries that effectively highlight key selling points based on textual and visual product information. However, existing approaches face two major challenges: first, traditional overlap-based metrics such as ROUGE struggle to reliably assess how well a summary captures essential product information; second, mainstream supervised fine-tuning paradigms fail to model users’ implicit preferences regarding the prominence of key elements, resulting in summaries that deviate from actual needs. To address these issues, this paper proposes a claim-based summarization evaluation metric (CSE), which evaluates the expression of key information from two dimensions: claim hit accuracy (CHA) and claim quantity ratio (CQR). Furthermore, we introduce PAMPS, a preference-aligned multimodal product summarization model that incorporates four stages—supervised fine-tuning, summary resampling, CSE-driven preference pair construction, and direct preference optimization—to progressively align the model with user preferences regarding key product elements. Experiments on the large-scale Chinese e-commerce dataset CEPSUM demonstrate the effectiveness of the proposed method. PAMPS achieves notable improvements in ROUGE metrics, where DPO-ROUGE improves ROUGE-1/2/L by 0.25, 0.44, and 1.21 on average compared with SFT, indicating enhanced overall generation quality. Under the CSE evaluation framework, DPO-CSE yields the most significant gains in claim hit accuracy, with an average improvement exceeding 4%, highlighting the capability of element-oriented preference optimization to strengthen the model’s ability to capture and express core product information. Overall, the results validate the effectiveness and practical value of the proposed approach in improving multimodal product summarization quality.
SONG Xuemeng, LI Zhimo, HOU Bohan, et al . Enhancing Multimodal Product Summarization through Claim-Based Evaluation and Preference Optimization[J]. Journal of South China University of Technology(Natural Science), 0 : 1 . DOI: 10.12141/j.issn.1000-565X.250375
/
| 〈 |
|
〉 |