CoSQL: A conversational text-to-SQL challenge towards cross-domain natural language interfaces to databases

Yu, Tao; Zhang, Rui; Er, He Yang; Li, Suyi; Xue, Eric; Pang, Bo; Lin, Xi Victoria; Tan, Yi Chern; Shi, Tianze; Li, Zihan; Jiang, Youxuan; Yasunaga, Michihiro; Shim, Sungrok; Chen, Tao; Fabbri, Alexander; Li, Zifan; Chen, Luyao; Zhang, Yuwen; Dixit, Shreya; Zhang, Vincent; Xiong, Caiming; Socher, Richard; Lasecki, Walter S.; Radev, Dragomir

File Download

content.pdf

Links for fulltext

(May Require Subscription)

Publisher Website: 10.18653/v1/D19-1204
Scopus: eid_2-s2.0-85084321905

Supplementary

Citations:
- Scopus: 0
Appears in Collections:
- Computer Science: Conference papers

Conference Paper: CoSQL: A conversational text-to-SQL challenge towards cross-domain natural language interfaces to databases

Title	CoSQL: A conversational text-to-SQL challenge towards cross-domain natural language interfaces to databases
Authors	Yu, Tao Zhang, Rui Er, He Yang Li, Suyi Xue, Eric Pang, Bo Lin, Xi Victoria Tan, Yi Chern Shi, Tianze Li, Zihan Jiang, Youxuan Yasunaga, Michihiro Shim, Sungrok Chen, Tao Fabbri, Alexander Li, Zifan Chen, Luyao Zhang, Yuwen Dixit, Shreya Zhang, Vincent Xiong, Caiming Socher, Richard Lasecki, Walter S.Radev, Dragomir
Issue Date	2019
Citation	2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP 2019), Hong Kong, 3-7 November 2019. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019, p. 1962-1979 How to Cite? DOI: http://dx.doi.org/10.18653/v1/D19-1204
Abstract	We present CoSQL, a corpus for building cross-domain, general-purpose database (DB) querying dialogue systems. It consists of 30k+ turns plus 10k+ annotated SQL queries, obtained from a Wizard-of-Oz (WOZ) collection of 3k dialogues querying 200 complex DBs spanning 138 domains. Each dialogue simulates a real-world DB query scenario with a crowd worker as a user exploring the DB and a SQL expert retrieving answers with SQL, clarifying ambiguous questions, or otherwise informing of unanswerable questions. When user questions are answerable by SQL, the expert describes the SQL and execution results to the user, hence maintaining a natural interaction flow. CoSQL introduces new challenges compared to existing task-oriented dialogue datasets: (1) the dialogue states are grounded in SQL, a domain-independent executable representation, instead of domain-specific slot-value pairs, and (2) because testing is done on unseen databases, success requires generalizing to new domains. CoSQL includes three tasks: SQL-grounded dialogue state tracking, response generation from query results, and user dialogue act prediction. We evaluate a set of strong baselines for each task and show that CoSQL presents significant challenges for future research. The dataset, baselines, and leaderboard will be released at https://yale-lily.github.io/cosql.
Persistent Identifier	http://hdl.handle.net/10722/303669

DC Field	Value	Language
dc.contributor.author	Yu, Tao	-
dc.contributor.author	Zhang, Rui	-
dc.contributor.author	Er, He Yang	-
dc.contributor.author	Li, Suyi	-
dc.contributor.author	Xue, Eric	-
dc.contributor.author	Pang, Bo	-
dc.contributor.author	Lin, Xi Victoria	-
dc.contributor.author	Tan, Yi Chern	-
dc.contributor.author	Shi, Tianze	-
dc.contributor.author	Li, Zihan	-
dc.contributor.author	Jiang, Youxuan	-
dc.contributor.author	Yasunaga, Michihiro	-
dc.contributor.author	Shim, Sungrok	-
dc.contributor.author	Chen, Tao	-
dc.contributor.author	Fabbri, Alexander	-
dc.contributor.author	Li, Zifan	-
dc.contributor.author	Chen, Luyao	-
dc.contributor.author	Zhang, Yuwen	-
dc.contributor.author	Dixit, Shreya	-
dc.contributor.author	Zhang, Vincent	-
dc.contributor.author	Xiong, Caiming	-
dc.contributor.author	Socher, Richard	-
dc.contributor.author	Lasecki, Walter S.	-
dc.contributor.author	Radev, Dragomir	-
dc.date.accessioned	2021-09-15T08:25:47Z	-
dc.date.available	2021-09-15T08:25:47Z	-
dc.date.issued	2019	-
dc.identifier.citation	2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP 2019), Hong Kong, 3-7 November 2019. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019, p. 1962-1979	-
dc.identifier.uri	http://hdl.handle.net/10722/303669	-
dc.description.abstract	We present CoSQL, a corpus for building cross-domain, general-purpose database (DB) querying dialogue systems. It consists of 30k+ turns plus 10k+ annotated SQL queries, obtained from a Wizard-of-Oz (WOZ) collection of 3k dialogues querying 200 complex DBs spanning 138 domains. Each dialogue simulates a real-world DB query scenario with a crowd worker as a user exploring the DB and a SQL expert retrieving answers with SQL, clarifying ambiguous questions, or otherwise informing of unanswerable questions. When user questions are answerable by SQL, the expert describes the SQL and execution results to the user, hence maintaining a natural interaction flow. CoSQL introduces new challenges compared to existing task-oriented dialogue datasets: (1) the dialogue states are grounded in SQL, a domain-independent executable representation, instead of domain-specific slot-value pairs, and (2) because testing is done on unseen databases, success requires generalizing to new domains. CoSQL includes three tasks: SQL-grounded dialogue state tracking, response generation from query results, and user dialogue act prediction. We evaluate a set of strong baselines for each task and show that CoSQL presents significant challenges for future research. The dataset, baselines, and leaderboard will be released at https://yale-lily.github.io/cosql.	-
dc.language	eng	-
dc.relation.ispartof	Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.title	CoSQL: A conversational text-to-SQL challenge towards cross-domain natural language interfaces to databases	-
dc.type	Conference_Paper	-
dc.description.nature	published_or_final_version	-
dc.identifier.doi	10.18653/v1/D19-1204	-
dc.identifier.scopus	eid_2-s2.0-85084321905	-
dc.identifier.spage	1962	-
dc.identifier.epage	1979	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: CoSQL: A conversational text-to-SQL challenge towards cross-domain natural language interfaces to databases

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats