File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Article: Semantic-Topology Preserving Quantization of Word Embeddings for Human-to-Machine Communications

TitleSemantic-Topology Preserving Quantization of Word Embeddings for Human-to-Machine Communications
Authors
Keywordshuman-robot interaction
Semantics
vector quantization (VQ)
Issue Date1-Jan-2025
PublisherInstitute of Electrical and Electronics Engineers
Citation
IEEE Transactions on Communications, 2025, v. 73, n. 4, p. 2401-2415 How to Cite?
AbstractThe vision of 6G mobile networks aims to connect intelligent machines to humans to provide the latter with cooperation, care, and assistance. The mainstream approach for human-to-machine (H2M) semantic communication is to map words into (word) embedding vectors which are clustered according to their semantic similarity to facilitate machines’ interpretation of human languages. The computation-intensive tasks of text-to-embedding mapping are usually delegated to an edge server that senses human commands, maps them into embedding vectors, and then transmits the vectors to a machine over a wireless link. In this work, we propose a quantization framework customized for embedding vectors, called semantic-topology preserving VQ (SemTop-VQ), to overcome the communication bottleneck due to the vectors’ high dimensionality. While traditional VQ focuses on minimizing the distortion of individual vectors, SemTop-VQ aims to minimize the distortion of the topology of embedding matrix, referring to the vectors’ relative positions that represent semantics. To this end, we adopt a topology-distortion metric, termed pointwise-inner-product (PIP) loss, a hierarchical VQ architecture targeting high-dimensional VQ. In this architecture, an embedding vector is decomposed into blocks; the norm and shape (normalized vector) are quantized separately using a scalar and a Grassmannian quantizers, respectively. The main feature of SemTop-VQ lies in deriving from the PIP loss a set of so-called semantic-importance indicators, which reflect the level of influences of individual blocks’ quantization errors on the topology distortion. Then the indicators are applied to optimize quantization-bit allocation for decomposed vector blocks under the criterion of PIP-loss minimization. In practice, the usage probabilities of embedding vectors for a specific machine task are highly skewed and the task is time-varying. We exploit this fact to further develop SemTop-VQ to feature task adaptation that can attain a higher communication efficiency. The task-adaptive VQ is realized via the use of a frequently used (quantization) codebook that is much smaller in size than the original codebook and continuously updated via estimation of embedding-usage distribution. Our experiments using real embedding datasets, namely Word2Vec and Glove, demonstrate the effectiveness of SemTop-VQ as a goal-oriented technique for efficient H2M communications.
Persistent Identifierhttp://hdl.handle.net/10722/362127
ISSN
2023 Impact Factor: 7.2
2020 SCImago Journal Rankings: 1.468

 

DC FieldValueLanguage
dc.contributor.authorLin, Zhenyi-
dc.contributor.authorYang, Lin-
dc.contributor.authorGong, Yi-
dc.contributor.authorHuang, Kaibin-
dc.date.accessioned2025-09-19T00:32:26Z-
dc.date.available2025-09-19T00:32:26Z-
dc.date.issued2025-01-01-
dc.identifier.citationIEEE Transactions on Communications, 2025, v. 73, n. 4, p. 2401-2415-
dc.identifier.issn0090-6778-
dc.identifier.urihttp://hdl.handle.net/10722/362127-
dc.description.abstractThe vision of 6G mobile networks aims to connect intelligent machines to humans to provide the latter with cooperation, care, and assistance. The mainstream approach for human-to-machine (H2M) semantic communication is to map words into (word) embedding vectors which are clustered according to their semantic similarity to facilitate machines’ interpretation of human languages. The computation-intensive tasks of text-to-embedding mapping are usually delegated to an edge server that senses human commands, maps them into embedding vectors, and then transmits the vectors to a machine over a wireless link. In this work, we propose a quantization framework customized for embedding vectors, called semantic-topology preserving VQ (SemTop-VQ), to overcome the communication bottleneck due to the vectors’ high dimensionality. While traditional VQ focuses on minimizing the distortion of individual vectors, SemTop-VQ aims to minimize the distortion of the topology of embedding matrix, referring to the vectors’ relative positions that represent semantics. To this end, we adopt a topology-distortion metric, termed pointwise-inner-product (PIP) loss, a hierarchical VQ architecture targeting high-dimensional VQ. In this architecture, an embedding vector is decomposed into blocks; the norm and shape (normalized vector) are quantized separately using a scalar and a Grassmannian quantizers, respectively. The main feature of SemTop-VQ lies in deriving from the PIP loss a set of so-called semantic-importance indicators, which reflect the level of influences of individual blocks’ quantization errors on the topology distortion. Then the indicators are applied to optimize quantization-bit allocation for decomposed vector blocks under the criterion of PIP-loss minimization. In practice, the usage probabilities of embedding vectors for a specific machine task are highly skewed and the task is time-varying. We exploit this fact to further develop SemTop-VQ to feature task adaptation that can attain a higher communication efficiency. The task-adaptive VQ is realized via the use of a frequently used (quantization) codebook that is much smaller in size than the original codebook and continuously updated via estimation of embedding-usage distribution. Our experiments using real embedding datasets, namely Word2Vec and Glove, demonstrate the effectiveness of SemTop-VQ as a goal-oriented technique for efficient H2M communications.-
dc.languageeng-
dc.publisherInstitute of Electrical and Electronics Engineers-
dc.relation.ispartofIEEE Transactions on Communications-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subjecthuman-robot interaction-
dc.subjectSemantics-
dc.subjectvector quantization (VQ)-
dc.titleSemantic-Topology Preserving Quantization of Word Embeddings for Human-to-Machine Communications-
dc.typeArticle-
dc.identifier.doi10.1109/TCOMM.2024.3471992-
dc.identifier.scopuseid_2-s2.0-105003047688-
dc.identifier.volume73-
dc.identifier.issue4-
dc.identifier.spage2401-
dc.identifier.epage2415-
dc.identifier.eissn1558-0857-
dc.identifier.issnl0090-6778-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats