TAQA: An Open Knowledge-based Question-Answering System for Answering Questions with Complex Semantics Constraints


Grant Data
Project Title
TAQA: An Open Knowledge-based Question-Answering System for Answering Questions with Complex Semantics Constraints
Principal Investigator
Professor Kao, Chi Ming   (Principal Investigator (PI))
Duration
30
Start Date
2016-11-01
Amount
563039
Conference Title
TAQA: An Open Knowledge-based Question-Answering System for Answering Questions with Complex Semantics Constraints
Presentation Title
Keywords
database, information retrieval, Knowledge base, question answering system
Discipline
Database and data science
Panel
Engineering (E)
HKU Project Code
17254016
Grant Type
General Research Fund (GRF)
Funding Year
2016
Status
Completed
Objectives
1) [Knowledge Base and Benchmarking] To evaluate the performance of TAQA, we need to collect a large number of n-tuple assertions and create an n-tuple OKB. We explore various ways of collecting assertions via web crawling. Moreover, we aim to collect at least 10,000 complex questions and to devise semi-automated methods to pair the questions with their corresponding gold-standard answers. All this data will form a valuable benchmark, which will benefit the research community in evaluating the performance of open KB-QA systems in answering complex questions; 2) [Question paraphrasing, question parsing, query answering, and answer ranking] Answering questions in a KB-QA system is a complex process that involves many steps. In particular, question paraphrasing, question parsing, query answering and answer ranking are the most critical ones that need to be redesigned when we migrate from a triplet-OKB to an n-tuple OKB. We investigate new algorithms for implementing these steps; 3) [Canonicalizing an n-tuple OKB] Open KBs that are extracted automatically from web sources using Open IE techniques are highly redundant in the sense that factual knowledge is often repeated and scattered across multiple assertions. This high-degree of redundancy makes the KB unnecessarily large and significantly slows down the response times. We investigate techniques that canonicalize an n-tuple OKB by removing redundant assertions and integrating related ones into integrated assertions; 4) [Prototyping and evaluation] We integrate our research ideas by developing a prototype of TAQA. Extensive experiments will be conducted with TAQA to evaluate our ideas.