A new view of multi-modal language analysis: Audio and video features as text “Styles”

Sun, Zhongkai; Sarma, Prathusha K.; Liang, Yingyu; Sethares, William A.

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Scopus: eid_2-s2.0-85107273286

Supplementary

Citations:
- Scopus: 0
Appears in Collections:
- HKU Musketeers Foundation Institute of Data Science: Conference papers

Conference Paper: A new view of multi-modal language analysis: Audio and video features as text “Styles”

Title	A new view of multi-modal language analysis: Audio and video features as text “Styles”
Authors	Sun, Zhongkai Sarma, Prathusha K.Liang, Yingyu Sethares, William A.
Issue Date	2021
Citation	EACL 2021 - 16th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference, 2021, p. 1956-1965 How to Cite?
Abstract	Imposing the style of one image onto another is called style transfer. For example, the style of a Van Gogh painting might be imposed on a photograph to yield an interesting hybrid. This paper applies the adaptive normalization used for image style transfer to language semantics, i.e., the style is the way the words are said (tone of voice and facial expressions) and these are style-transferred onto the text. The goal is to learn richer representations for multi-modal utterances using style-transferred multi-modal features. The proposed Style-Transfer Transformer (STT) grafts a stepped styled adaptive layer-normalization onto a transformer network, the output from which is used in sentiment analysis and emotion recognition problems. In addition to achieving performance on par with the state-of-the art (but using less than a third of the model parameters), we examine the relative contributions of each mode when used in the downstream applications.
Persistent Identifier	http://hdl.handle.net/10722/341315

DC Field	Value	Language
dc.contributor.author	Sun, Zhongkai	-
dc.contributor.author	Sarma, Prathusha K.	-
dc.contributor.author	Liang, Yingyu	-
dc.contributor.author	Sethares, William A.	-
dc.date.accessioned	2024-03-13T08:41:51Z	-
dc.date.available	2024-03-13T08:41:51Z	-
dc.date.issued	2021	-
dc.identifier.citation	EACL 2021 - 16th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference, 2021, p. 1956-1965	-
dc.identifier.uri	http://hdl.handle.net/10722/341315	-
dc.description.abstract	Imposing the style of one image onto another is called style transfer. For example, the style of a Van Gogh painting might be imposed on a photograph to yield an interesting hybrid. This paper applies the adaptive normalization used for image style transfer to language semantics, i.e., the style is the way the words are said (tone of voice and facial expressions) and these are style-transferred onto the text. The goal is to learn richer representations for multi-modal utterances using style-transferred multi-modal features. The proposed Style-Transfer Transformer (STT) grafts a stepped styled adaptive layer-normalization onto a transformer network, the output from which is used in sentiment analysis and emotion recognition problems. In addition to achieving performance on par with the state-of-the art (but using less than a third of the model parameters), we examine the relative contributions of each mode when used in the downstream applications.	-
dc.language	eng	-
dc.relation.ispartof	EACL 2021 - 16th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference	-
dc.title	A new view of multi-modal language analysis: Audio and video features as text “Styles”	-
dc.type	Conference_Paper	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.scopus	eid_2-s2.0-85107273286	-
dc.identifier.spage	1956	-
dc.identifier.epage	1965	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: A new view of multi-modal language analysis: Audio and video features as text “Styles”

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats