Researchers from HKU and Tencent have proposed a new paradigm for multimodal recommender systems -- theDiffMMThe aim is to increaseShort VideoRecommendation accuracy. The system achieves more accurate recommendations by creating a graph containing information about users and videos and utilizing graph diffusion and contrast learning techniques to better understand the relationship between users and videos.
The modeling approach of DiffMM contains three main parts:multimodal graph diffusion model, multimodal graph aggregation and cross-modal contrast enhancement. Among them, the multimodal graph diffusion model unifies user-item synergistic signals with multimodal information through modality-aware denoising diffusion probability model, which effectively solves the negative impact in multimodal recommender systems. Meanwhile, modal-aware user-item graph generation and optimization is achieved through graph probability diffusion paradigm and modal-aware graph diffusion optimization.
In terms of cross-modal contrast enhancement, DiffMM utilizes modality-aware contrast view and contrast enhancement methods to capture the consistency of user interaction patterns on different item modalities and improve recommender system performance.
Paper:https://arxiv.org/abs/2406.1178