File Download
  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: Distilling an ensemble of greedy dependency parsers into one MST parser

TitleDistilling an ensemble of greedy dependency parsers into one MST parser
Authors
Issue Date2016
Citation
2016 Conference on Empirical Methods in Natural Language Processing, Austin, TX, 1-5 November 2016. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016, p. 1744-1753 How to Cite?
Abstract© 2016 Association for Computational Linguistics. We introduce two first-order graph-based dependency parsers achieving a new state of the art. The first is a consensus parser built from an ensemble of independently trained greedy LSTM transition-based parsers with different random initializations. We cast this approach as minimum Bayes risk decoding (under the Hamming cost) and argue that weaker consensus within the ensemble is a useful signal of difficulty or ambiguity. The second parser is a “distillation” of the ensemble into a single model. We train the distillation parser using a structured hinge loss objective with a novel cost that incorporates ensemble uncertainty estimates for each possible attachment, thereby avoiding the intractable cross-entropy computations required by applying standard distillation objectives to problems with structured outputs. The first-order distillation parser matches or surpasses the state of the art on English, Chinese, and German.
Persistent Identifierhttp://hdl.handle.net/10722/296149

 

DC FieldValueLanguage
dc.contributor.authorKuncoro, Adhiguna-
dc.contributor.authorBallesteros, Miguel-
dc.contributor.authorKong, Lingpeng-
dc.contributor.authorDyer, Chris-
dc.contributor.authorSmith, Noah A.-
dc.date.accessioned2021-02-11T04:52:56Z-
dc.date.available2021-02-11T04:52:56Z-
dc.date.issued2016-
dc.identifier.citation2016 Conference on Empirical Methods in Natural Language Processing, Austin, TX, 1-5 November 2016. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016, p. 1744-1753-
dc.identifier.urihttp://hdl.handle.net/10722/296149-
dc.description.abstract© 2016 Association for Computational Linguistics. We introduce two first-order graph-based dependency parsers achieving a new state of the art. The first is a consensus parser built from an ensemble of independently trained greedy LSTM transition-based parsers with different random initializations. We cast this approach as minimum Bayes risk decoding (under the Hamming cost) and argue that weaker consensus within the ensemble is a useful signal of difficulty or ambiguity. The second parser is a “distillation” of the ensemble into a single model. We train the distillation parser using a structured hinge loss objective with a novel cost that incorporates ensemble uncertainty estimates for each possible attachment, thereby avoiding the intractable cross-entropy computations required by applying standard distillation objectives to problems with structured outputs. The first-order distillation parser matches or surpasses the state of the art on English, Chinese, and German.-
dc.languageeng-
dc.relation.ispartofProceedings of the 2016 Conference on Empirical Methods in Natural Language Processing-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.titleDistilling an ensemble of greedy dependency parsers into one MST parser-
dc.typeConference_Paper-
dc.description.naturepublished_or_final_version-
dc.identifier.doi10.18653/v1/d16-1180-
dc.identifier.scopuseid_2-s2.0-85021649927-
dc.identifier.spage1744-
dc.identifier.epage1753-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats