End-to-End Neural Segmental Models for Speech Recognition

Tang, Hao; Lu, Liang; Kong, Lingpeng; Gimpel, Kevin; Livescu, Karen; Dyer, Chris; Smith, Noah A.; Renals, Steve

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1109/JSTSP.2017.2752462
Scopus: eid_2-s2.0-85030313902
WOS: WOS:000416226000003
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Computer Science: Journal/Magazine Articles

Article: End-to-End Neural Segmental Models for Speech Recognition

Title	End-to-End Neural Segmental Models for Speech Recognition
Authors	Tang, Hao Lu, Liang Kong, Lingpeng Gimpel, Kevin Livescu, Karen Dyer, Chris Smith, Noah A.Renals, Steve
Keywords	Connectionist temporal classification end-to-end training segmental models multitask training
Issue Date	2017
Citation	IEEE Journal on Selected Topics in Signal Processing, 2017, v. 11, n. 8, p. 1254-1264 How to Cite? DOI: http://dx.doi.org/10.1109/JSTSP.2017.2752462
Abstract	Segmental models are an alternative to frame-based models for sequence prediction, where hypothesized path weights are based on entire segment scores rather than a single frame at a time. Neural segmental models are segmental models that use neural network-based weight functions. Neural segmental models have achieved competitive results for speech recognition, and their end-to-end training has been explored in several studies. In this work, we review neural segmental models, which can be viewed as consisting of a neural network-based acoustic encoder and a finite-state transducer decoder. We study end-to-end segmental models with different weight functions, including ones based on frame-level neural classifiers and on segmental recurrent neural networks. We study how reducing the search space size impacts performance under different weight functions. We also compare several loss functions for end-to-end training. Finally, we explore training approaches, including multistage versus end-to-end training and multitask training that combines segmental and frame-level losses.
Persistent Identifier	http://hdl.handle.net/10722/296158
ISSN	1932-4553 2023 Impact Factor: 8.7 2023 SCImago Journal Rankings: 3.818
ISI Accession Number ID	WOS:000416226000003

DC Field	Value	Language
dc.contributor.author	Tang, Hao	-
dc.contributor.author	Lu, Liang	-
dc.contributor.author	Kong, Lingpeng	-
dc.contributor.author	Gimpel, Kevin	-
dc.contributor.author	Livescu, Karen	-
dc.contributor.author	Dyer, Chris	-
dc.contributor.author	Smith, Noah A.	-
dc.contributor.author	Renals, Steve	-
dc.date.accessioned	2021-02-11T04:52:57Z	-
dc.date.available	2021-02-11T04:52:57Z	-
dc.date.issued	2017	-
dc.identifier.citation	IEEE Journal on Selected Topics in Signal Processing, 2017, v. 11, n. 8, p. 1254-1264	-
dc.identifier.issn	1932-4553	-
dc.identifier.uri	http://hdl.handle.net/10722/296158	-
dc.description.abstract	Segmental models are an alternative to frame-based models for sequence prediction, where hypothesized path weights are based on entire segment scores rather than a single frame at a time. Neural segmental models are segmental models that use neural network-based weight functions. Neural segmental models have achieved competitive results for speech recognition, and their end-to-end training has been explored in several studies. In this work, we review neural segmental models, which can be viewed as consisting of a neural network-based acoustic encoder and a finite-state transducer decoder. We study end-to-end segmental models with different weight functions, including ones based on frame-level neural classifiers and on segmental recurrent neural networks. We study how reducing the search space size impacts performance under different weight functions. We also compare several loss functions for end-to-end training. Finally, we explore training approaches, including multistage versus end-to-end training and multitask training that combines segmental and frame-level losses.	-
dc.language	eng	-
dc.relation.ispartof	IEEE Journal on Selected Topics in Signal Processing	-
dc.subject	Connectionist temporal classification	-
dc.subject	end-to-end training	-
dc.subject	segmental models	-
dc.subject	multitask training	-
dc.title	End-to-End Neural Segmental Models for Speech Recognition	-
dc.type	Article	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1109/JSTSP.2017.2752462	-
dc.identifier.scopus	eid_2-s2.0-85030313902	-
dc.identifier.volume	11	-
dc.identifier.issue	8	-
dc.identifier.spage	1254	-
dc.identifier.epage	1264	-
dc.identifier.isi	WOS:000416226000003	-
dc.identifier.issnl	1932-4553	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: End-to-End Neural Segmental Models for Speech Recognition

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats