Identification of approximately duplicate material records in ERP systems

Zong, W; Wu, F; Chu, LK; Sculli, D

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1080/17517575.2015.1065513
Scopus: eid_2-s2.0-84936972848
WOS: WOS:000392601700006
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Industrial & Manufacturing Systems Engineering: Journal/Magazine Articles

Article: Identification of approximately duplicate material records in ERP systems

Title	Identification of approximately duplicate material records in ERP systems
Authors	Zong, W Wu, F Chu, LK Sculli, D
Keywords	approximately duplicate material records data quality enterprise resource planning (ERP) systems probabilistic neural network (PNN) records de-duplication
Issue Date	2017
Citation	Enterprise Information Systems, 2017, v. 11 n. 3, p. 434-451 How to Cite? DOI: http://dx.doi.org/10.1080/17517575.2015.1065513
Abstract	The quality of master data is crucial for the accurate functioning of the various modules of an enterprise resource planning (ERP) system. This study addresses specific data problems arising from the generation of approximately duplicate material records in ERP databases. Such problems are mainly due to the firm’s lack of unique and global identifiers for the material records, and to the arbitrary assignment of alternative names for the same material by various users. Traditional duplicate detection methods are ineffective in identifying such approximately duplicate material records because these methods typically rely on string comparisons of each field. To address this problem, a machine learning-based framework is developed to recognise semantic similarity between strings and to further identify and reunify approximately duplicate material records – a process referred to as de-duplication in this article. First, the keywords of the material records are extracted to form vectors of discriminating words. Second, a machine learning method using a probabilistic neural network is applied to determine the semantic similarity between these material records. The approach was evaluated using data from a real case study. The test results indicate that the proposed method outperforms traditional algorithms in identifying approximately duplicate material records.
Persistent Identifier	http://hdl.handle.net/10722/211779
ISSN	1751-7575 2021 Impact Factor: 4.407 2020 SCImago Journal Rankings: 0.596
ISI Accession Number ID	WOS:000392601700006

DC Field	Value	Language
dc.contributor.author	Zong, W	-
dc.contributor.author	Wu, F	-
dc.contributor.author	Chu, LK	-
dc.contributor.author	Sculli, D	-
dc.date.accessioned	2015-07-21T02:10:33Z	-
dc.date.available	2015-07-21T02:10:33Z	-
dc.date.issued	2017	-
dc.identifier.citation	Enterprise Information Systems, 2017, v. 11 n. 3, p. 434-451	-
dc.identifier.issn	1751-7575	-
dc.identifier.uri	http://hdl.handle.net/10722/211779	-
dc.description.abstract	The quality of master data is crucial for the accurate functioning of the various modules of an enterprise resource planning (ERP) system. This study addresses specific data problems arising from the generation of approximately duplicate material records in ERP databases. Such problems are mainly due to the firm’s lack of unique and global identifiers for the material records, and to the arbitrary assignment of alternative names for the same material by various users. Traditional duplicate detection methods are ineffective in identifying such approximately duplicate material records because these methods typically rely on string comparisons of each field. To address this problem, a machine learning-based framework is developed to recognise semantic similarity between strings and to further identify and reunify approximately duplicate material records – a process referred to as de-duplication in this article. First, the keywords of the material records are extracted to form vectors of discriminating words. Second, a machine learning method using a probabilistic neural network is applied to determine the semantic similarity between these material records. The approach was evaluated using data from a real case study. The test results indicate that the proposed method outperforms traditional algorithms in identifying approximately duplicate material records.	-
dc.language	eng	-
dc.relation.ispartof	Enterprise Information Systems	-
dc.subject	approximately duplicate material records	-
dc.subject	data quality	-
dc.subject	enterprise resource planning (ERP) systems	-
dc.subject	probabilistic neural network (PNN)	-
dc.subject	records de-duplication	-
dc.title	Identification of approximately duplicate material records in ERP systems	-
dc.type	Article	-
dc.identifier.email	Chu, LK: lkchu@hkucc.hku.hk	-
dc.identifier.email	Sculli, D: hreidsc@hkucc.hku.hk	-
dc.identifier.authority	Chu, LK=rp00113	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1080/17517575.2015.1065513	-
dc.identifier.scopus	eid_2-s2.0-84936972848	-
dc.identifier.hkuros	245672	-
dc.identifier.volume	11	-
dc.identifier.issue	3	-
dc.identifier.spage	434	-
dc.identifier.epage	451	-
dc.identifier.eissn	1751-7583	-
dc.identifier.isi	WOS:000392601700006	-
dc.identifier.issnl	1751-7575	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: Identification of approximately duplicate material records in ERP systems

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats