File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: PBoS: Probabilistic bag-of-subwords for generalizing word embedding

TitlePBoS: Probabilistic bag-of-subwords for generalizing word embedding
Authors
Issue Date2020
Citation
Findings of the Association for Computational Linguistics Findings of ACL: EMNLP 2020, 2020, p. 596-611 How to Cite?
AbstractWe look into the task of generalizing word embeddings: given a set of pre-trained word vectors over a finite vocabulary, the goal is to predict embedding vectors for out-of-vocabulary words, without extra contextual information. We rely solely on the spellings of words and propose a model, along with an efficient algorithm, that simultaneously models subword segmentation and computes subword-based compositional word embedding. We call the model probabilistic bag-of-subwords (PBoS), as it applies bag-of-subwords for all possible segmentations based on their likelihood. Inspections and affix prediction experiment show that PBoS is able to produce meaningful subword segmentations and subword rankings without any source of explicit morphological knowledge. Word similarity and POS tagging experiments show clear advantages of PBoS over previous subword-level models in the quality of generated word embeddings across languages.
Persistent Identifierhttp://hdl.handle.net/10722/341332

 

DC FieldValueLanguage
dc.contributor.authorJinman, Zhao-
dc.contributor.authorZhong, Shawn-
dc.contributor.authorZhang, Xiaomin-
dc.contributor.authorLiang, Yingyu-
dc.date.accessioned2024-03-13T08:41:59Z-
dc.date.available2024-03-13T08:41:59Z-
dc.date.issued2020-
dc.identifier.citationFindings of the Association for Computational Linguistics Findings of ACL: EMNLP 2020, 2020, p. 596-611-
dc.identifier.urihttp://hdl.handle.net/10722/341332-
dc.description.abstractWe look into the task of generalizing word embeddings: given a set of pre-trained word vectors over a finite vocabulary, the goal is to predict embedding vectors for out-of-vocabulary words, without extra contextual information. We rely solely on the spellings of words and propose a model, along with an efficient algorithm, that simultaneously models subword segmentation and computes subword-based compositional word embedding. We call the model probabilistic bag-of-subwords (PBoS), as it applies bag-of-subwords for all possible segmentations based on their likelihood. Inspections and affix prediction experiment show that PBoS is able to produce meaningful subword segmentations and subword rankings without any source of explicit morphological knowledge. Word similarity and POS tagging experiments show clear advantages of PBoS over previous subword-level models in the quality of generated word embeddings across languages.-
dc.languageeng-
dc.relation.ispartofFindings of the Association for Computational Linguistics Findings of ACL: EMNLP 2020-
dc.titlePBoS: Probabilistic bag-of-subwords for generalizing word embedding-
dc.typeConference_Paper-
dc.description.naturelink_to_subscribed_fulltext-
dc.identifier.scopuseid_2-s2.0-85118432460-
dc.identifier.spage596-
dc.identifier.epage611-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats