On improving knowledge modeling in knowledge-based systems

Wu, Tien Hsuan Kevin; 吳典軒

File Download

FullText.pdf

Supplementary

Citations:
Appears in Collections:
- HKU Theses Online
- Computer Science: Theses

postgraduate thesis: On improving knowledge modeling in knowledge-based systems

Title	On improving knowledge modeling in knowledge-based systems
Authors	Wu, Tien Hsuan Kevin 吳典軒
Advisors	Advisor(s):Kao, CM
Issue Date	2021
Publisher	The University of Hong Kong (Pokfulam, Hong Kong)
Citation	Wu, T. H. K. [吳典軒]. (2021). On improving knowledge modeling in knowledge-based systems. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
Abstract	Knowledge bases (KBs), such as Freebase and Wikidata, have received a lot of attention in the research community. The development of KB has benefited a number of interesting applications, including knowledge-based question answering (KB-QA) systems, recommendation systems and machine reading comprehension systems. One type of KBs, named Open Knowledge Bases (OKBs), is constructed by applying an open information extraction system to extract assertions from a large corpus. OKBs are noisy as they contain ambiguous facts. In the first part of the thesis, we put forward algorithms that resolve ambiguities in OKBs. After the discussion of OKBs, we present building of a domain specific KB. In particular, we build a legal KB from court judgments and demonstrate that the legal KB benefits a number of legal applications. Firstly, we propose an efficient algorithm to resolve ambiguities in OKBs. Previously a canonicalization approach based on hierarchical agglomerative clustering was proposed for entity resolution in OKBs. Despite the high effectiveness of the canonicalization approach, it cannot be applied to large OKBs due to its high computation cost. We propose Fast Assertion Canonicalization (FAC) to address the efficiency issue. FAC employs several optimization strategies, including pruning techniques to avoid unnecessary similarity computations and bounding techniques to efficiently approximate small similarities. We demonstrate that FAC achieves order-of-magnitude speedups over other approaches. Secondly, with the development of language models, many NLP problems can be solved with better accuracies, including the OKB canonicalization problem. We propose Multi-Level Canonicalization with Embeddings (MULCE), which solves the canonicalization using a two-step framework. In the first step, assertions are coarsely grouped into clusters according to the GloVe vectors of the subjects. The second step refines the clusters by modeling relation and object information with BERT embeddings. We show that combining multiple embedding approaches to solve the canonicalization problem is better than employing a single embedding method, and that MULCE outperforms other state-of-the-art approaches. Finally, we describe how to build a legal KB from court judgments. Judgments are an important source of legal knowledge, but the large number of judgments make one difficult to master all the knowledge. We study the problems of machine-assisted extraction and modeling of legal knowledge from legal texts, leveraging the domain knowledge provided by human experts. Through examples, we show how our knowledge models can lead to the development of a number of essential legal applications that facilitate legal studies and research. In particular, we develop a prediction model for illegal drug trafficking sentencing. We demonstrate how the extracted legal knowledge and the sentencing prediction model help address a number of interesting issues in legal information processing, which include (1) extraction and cleaning of legal data, (2) discovery of model drifts in legal rules, (3) identification of critical features in legal judgments, (4) recommendation of similar judgments, (5) fairness in machine predictions, and (6) explainability of machine predictions.
Degree	Doctor of Philosophy
Subject	Expert systems (Computer science)
Dept/Program	Computer Science
Persistent Identifier	http://hdl.handle.net/10722/308602

DC Field	Value	Language
dc.contributor.advisor	Kao, CM	-
dc.contributor.author	Wu, Tien Hsuan Kevin	-
dc.contributor.author	吳典軒	-
dc.date.accessioned	2021-12-06T01:03:58Z	-
dc.date.available	2021-12-06T01:03:58Z	-
dc.date.issued	2021	-
dc.identifier.citation	Wu, T. H. K. [吳典軒]. (2021). On improving knowledge modeling in knowledge-based systems. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.	-
dc.identifier.uri	http://hdl.handle.net/10722/308602	-
dc.description.abstract	Knowledge bases (KBs), such as Freebase and Wikidata, have received a lot of attention in the research community. The development of KB has benefited a number of interesting applications, including knowledge-based question answering (KB-QA) systems, recommendation systems and machine reading comprehension systems. One type of KBs, named Open Knowledge Bases (OKBs), is constructed by applying an open information extraction system to extract assertions from a large corpus. OKBs are noisy as they contain ambiguous facts. In the first part of the thesis, we put forward algorithms that resolve ambiguities in OKBs. After the discussion of OKBs, we present building of a domain specific KB. In particular, we build a legal KB from court judgments and demonstrate that the legal KB benefits a number of legal applications. Firstly, we propose an efficient algorithm to resolve ambiguities in OKBs. Previously a canonicalization approach based on hierarchical agglomerative clustering was proposed for entity resolution in OKBs. Despite the high effectiveness of the canonicalization approach, it cannot be applied to large OKBs due to its high computation cost. We propose Fast Assertion Canonicalization (FAC) to address the efficiency issue. FAC employs several optimization strategies, including pruning techniques to avoid unnecessary similarity computations and bounding techniques to efficiently approximate small similarities. We demonstrate that FAC achieves order-of-magnitude speedups over other approaches. Secondly, with the development of language models, many NLP problems can be solved with better accuracies, including the OKB canonicalization problem. We propose Multi-Level Canonicalization with Embeddings (MULCE), which solves the canonicalization using a two-step framework. In the first step, assertions are coarsely grouped into clusters according to the GloVe vectors of the subjects. The second step refines the clusters by modeling relation and object information with BERT embeddings. We show that combining multiple embedding approaches to solve the canonicalization problem is better than employing a single embedding method, and that MULCE outperforms other state-of-the-art approaches. Finally, we describe how to build a legal KB from court judgments. Judgments are an important source of legal knowledge, but the large number of judgments make one difficult to master all the knowledge. We study the problems of machine-assisted extraction and modeling of legal knowledge from legal texts, leveraging the domain knowledge provided by human experts. Through examples, we show how our knowledge models can lead to the development of a number of essential legal applications that facilitate legal studies and research. In particular, we develop a prediction model for illegal drug trafficking sentencing. We demonstrate how the extracted legal knowledge and the sentencing prediction model help address a number of interesting issues in legal information processing, which include (1) extraction and cleaning of legal data, (2) discovery of model drifts in legal rules, (3) identification of critical features in legal judgments, (4) recommendation of similar judgments, (5) fairness in machine predictions, and (6) explainability of machine predictions.	-
dc.language	eng	-
dc.publisher	The University of Hong Kong (Pokfulam, Hong Kong)	-
dc.relation.ispartof	HKU Theses Online (HKUTO)	-
dc.rights	The author retains all proprietary rights, (such as patent rights) and the right to use in future works.	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject.lcsh	Expert systems (Computer science)	-
dc.title	On improving knowledge modeling in knowledge-based systems	-
dc.type	PG_Thesis	-
dc.description.thesisname	Doctor of Philosophy	-
dc.description.thesislevel	Doctoral	-
dc.description.thesisdiscipline	Computer Science	-
dc.description.nature	published_or_final_version	-
dc.date.hkucongregation	2021	-
dc.identifier.mmsid	991044448917103414	-

File Download

Supplementary

postgraduate thesis: On improving knowledge modeling in knowledge-based systems

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats