Multi-Modal Self-Supervised Learning for Recommendation

Wei, Wei; Huang, Chao; Xia, Lianghao; Zhang, Chuxu

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1145/3543507.3583206
Scopus: eid_2-s2.0-85159378843

Supplementary

Citations:
- Scopus: 0
Appears in Collections:
- Computer Science: Conference papers

Conference Paper: Multi-Modal Self-Supervised Learning for Recommendation

Title	Multi-Modal Self-Supervised Learning for Recommendation
Authors	Wei, Wei Huang, Chao Xia, Lianghao Zhang, Chuxu
Keywords	Multi-Modal Recommendation Self-Supervised Learning
Issue Date	2023
Citation	ACM Web Conference 2023 - Proceedings of the World Wide Web Conference, WWW 2023, 2023, p. 790-800 How to Cite? DOI: http://dx.doi.org/10.1145/3543507.3583206
Abstract	The online emergence of multi-modal sharing platforms (e.g., TikTok, Youtube) is powering personalized recommender systems to incorporate various modalities (e.g., visual, textual and acoustic) into the latent user representations. While existing works on multi-modal recommendation exploit multimedia content features in enhancing item embeddings, their model representation capability is limited by heavy label reliance and weak robustness on sparse user behavior data. Inspired by the recent progress of self-supervised learning in alleviating label scarcity issue, we explore deriving self-supervision signals with effectively learning of modality-aware user preference and cross-modal dependencies. To this end, we propose a new Multi-Modal Self-Supervised Learning (MMSSL) method which tackles two key challenges. Specifically, to characterize the inter-dependency between the user-item collaborative view and item multi-modal semantic view, we design a modality-aware interactive structure learning paradigm via adversarial perturbations for data augmentation. In addition, to capture the effects that user's modality-aware interaction pattern would interweave with each other, a cross-modal contrastive learning approach is introduced to jointly preserve the inter-modal semantic commonality and user preference diversity. Experiments on real-world datasets verify the superiority of our method in offering great potential for multimedia recommendation over various state-of-the-art baselines. The implementation is released at: https://github.com/HKUDS/MMSSL.
Persistent Identifier	http://hdl.handle.net/10722/355938

DC Field	Value	Language
dc.contributor.author	Wei, Wei	-
dc.contributor.author	Huang, Chao	-
dc.contributor.author	Xia, Lianghao	-
dc.contributor.author	Zhang, Chuxu	-
dc.date.accessioned	2025-05-19T05:46:47Z	-
dc.date.available	2025-05-19T05:46:47Z	-
dc.date.issued	2023	-
dc.identifier.citation	ACM Web Conference 2023 - Proceedings of the World Wide Web Conference, WWW 2023, 2023, p. 790-800	-
dc.identifier.uri	http://hdl.handle.net/10722/355938	-
dc.description.abstract	The online emergence of multi-modal sharing platforms (e.g., TikTok, Youtube) is powering personalized recommender systems to incorporate various modalities (e.g., visual, textual and acoustic) into the latent user representations. While existing works on multi-modal recommendation exploit multimedia content features in enhancing item embeddings, their model representation capability is limited by heavy label reliance and weak robustness on sparse user behavior data. Inspired by the recent progress of self-supervised learning in alleviating label scarcity issue, we explore deriving self-supervision signals with effectively learning of modality-aware user preference and cross-modal dependencies. To this end, we propose a new Multi-Modal Self-Supervised Learning (MMSSL) method which tackles two key challenges. Specifically, to characterize the inter-dependency between the user-item collaborative view and item multi-modal semantic view, we design a modality-aware interactive structure learning paradigm via adversarial perturbations for data augmentation. In addition, to capture the effects that user's modality-aware interaction pattern would interweave with each other, a cross-modal contrastive learning approach is introduced to jointly preserve the inter-modal semantic commonality and user preference diversity. Experiments on real-world datasets verify the superiority of our method in offering great potential for multimedia recommendation over various state-of-the-art baselines. The implementation is released at: https://github.com/HKUDS/MMSSL.	-
dc.language	eng	-
dc.relation.ispartof	ACM Web Conference 2023 - Proceedings of the World Wide Web Conference, WWW 2023	-
dc.subject	Multi-Modal Recommendation	-
dc.subject	Self-Supervised Learning	-
dc.title	Multi-Modal Self-Supervised Learning for Recommendation	-
dc.type	Conference_Paper	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1145/3543507.3583206	-
dc.identifier.scopus	eid_2-s2.0-85159378843	-
dc.identifier.spage	790	-
dc.identifier.epage	800	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: Multi-Modal Self-Supervised Learning for Recommendation

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats