Addressing Out-of-Distribution Challenges in Image Semantic Communication Systems with Multi-modal Large Language Models

Zhang, Feifan; Du, Yuyang; Chen, Kexin; Shao, Yulin; Liew, Soung Chang

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Scopus: eid_2-s2.0-85215525063
Find via

Supplementary

Citations:
- Scopus: 0
Appears in Collections:
- Electrical & Electronic Engineering: Conference papers

Conference Paper: Addressing Out-of-Distribution Challenges in Image Semantic Communication Systems with Multi-modal Large Language Models

Title	Addressing Out-of-Distribution Challenges in Image Semantic Communication Systems with Multi-modal Large Language Models
Authors	Zhang, Feifan Du, Yuyang Chen, Kexin Shao, Yulin Liew, Soung Chang
Keywords	generative Als multi-modal foundation model out-of-distribution problem Semantic communication
Issue Date	2024
Citation	Proceedings of the International Symposium on Modeling and Optimization in Mobile Ad Hoc and Wireless Networks Wiopt, 2024, p. 7-14 How to Cite?
Abstract	Semantic communication is a promising technology for next-generation wireless networks. However, the out-of-distribution (OOD) problem, where a pre-trained machine learning (ML) model is applied to unseen tasks that are outside the distribution of its training data, may compromise the integrity of semantic compression. This paper explores the use of multi-modal large language models (MLLMs) to address the OOD issue in image semantic communication. We propose a novel 'Plan A - Plan B' framework that leverages the broad knowledge and strong generalization ability of an MLLM to assist a conventional ML model when the latter encounters an OOD input in the semantic encoding process. The novel framework integrates the anti-OOD ability of MLLMs with the domain expertise of ML models in tasks they have been trained for, thus enhancing the accuracy in the semantic encoding process. Further, at the receiver side of the communication system, we put forth a 'generate-criticize' framework that allows one MLLM to challenge the image generated by another MLLM, which then revises the generated image in the next iteration. The joint effort of the two MLLMs significantly enhances the reliability of image reconstruction.
Persistent Identifier	http://hdl.handle.net/10722/362952
ISSN	2690-3334

DC Field	Value	Language
dc.contributor.author	Zhang, Feifan	-
dc.contributor.author	Du, Yuyang	-
dc.contributor.author	Chen, Kexin	-
dc.contributor.author	Shao, Yulin	-
dc.contributor.author	Liew, Soung Chang	-
dc.date.accessioned	2025-10-10T07:43:38Z	-
dc.date.available	2025-10-10T07:43:38Z	-
dc.date.issued	2024	-
dc.identifier.citation	Proceedings of the International Symposium on Modeling and Optimization in Mobile Ad Hoc and Wireless Networks Wiopt, 2024, p. 7-14	-
dc.identifier.issn	2690-3334	-
dc.identifier.uri	http://hdl.handle.net/10722/362952	-
dc.description.abstract	Semantic communication is a promising technology for next-generation wireless networks. However, the out-of-distribution (OOD) problem, where a pre-trained machine learning (ML) model is applied to unseen tasks that are outside the distribution of its training data, may compromise the integrity of semantic compression. This paper explores the use of multi-modal large language models (MLLMs) to address the OOD issue in image semantic communication. We propose a novel 'Plan A - Plan B' framework that leverages the broad knowledge and strong generalization ability of an MLLM to assist a conventional ML model when the latter encounters an OOD input in the semantic encoding process. The novel framework integrates the anti-OOD ability of MLLMs with the domain expertise of ML models in tasks they have been trained for, thus enhancing the accuracy in the semantic encoding process. Further, at the receiver side of the communication system, we put forth a 'generate-criticize' framework that allows one MLLM to challenge the image generated by another MLLM, which then revises the generated image in the next iteration. The joint effort of the two MLLMs significantly enhances the reliability of image reconstruction.	-
dc.language	eng	-
dc.relation.ispartof	Proceedings of the International Symposium on Modeling and Optimization in Mobile Ad Hoc and Wireless Networks Wiopt	-
dc.subject	generative Als	-
dc.subject	multi-modal foundation model	-
dc.subject	out-of-distribution problem	-
dc.subject	Semantic communication	-
dc.title	Addressing Out-of-Distribution Challenges in Image Semantic Communication Systems with Multi-modal Large Language Models	-
dc.type	Conference_Paper	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.scopus	eid_2-s2.0-85215525063	-
dc.identifier.spage	7	-
dc.identifier.epage	14	-
dc.identifier.eissn	2690-3342	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: Addressing Out-of-Distribution Challenges in Image Semantic Communication Systems with Multi-modal Large Language Models

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats