File Download
There are no files associated with this item.
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1080/17517575.2015.1065513
- Scopus: eid_2-s2.0-84936972848
- WOS: WOS:000392601700006
- Find via
Supplementary
- Citations:
- Appears in Collections:
Article: Identification of approximately duplicate material records in ERP systems
Title | Identification of approximately duplicate material records in ERP systems |
---|---|
Authors | |
Keywords | approximately duplicate material records data quality enterprise resource planning (ERP) systems probabilistic neural network (PNN) records de-duplication |
Issue Date | 2017 |
Citation | Enterprise Information Systems, 2017, v. 11 n. 3, p. 434-451 How to Cite? |
Abstract | The quality of master data is crucial for the accurate functioning of the various modules of an enterprise resource planning (ERP) system. This study addresses specific data problems arising from the generation of approximately duplicate material records in ERP databases. Such problems are mainly due to the firm’s lack of unique and global identifiers for the material records, and to the arbitrary assignment of alternative names for the same material by various users. Traditional duplicate detection methods are ineffective in identifying such approximately duplicate material records because these methods typically rely on string comparisons of each field. To address this problem, a machine learning-based framework is developed to recognise semantic similarity between strings and to further identify and reunify approximately duplicate material records – a process referred to as de-duplication in this article. First, the keywords of the material records are extracted to form vectors of discriminating words. Second, a machine learning method using a probabilistic neural network is applied to determine the semantic similarity between these material records. The approach was evaluated using data from a real case study. The test results indicate that the proposed method outperforms traditional algorithms in identifying approximately duplicate material records. |
Persistent Identifier | http://hdl.handle.net/10722/211779 |
ISSN | 2023 Impact Factor: 4.4 2023 SCImago Journal Rankings: 0.875 |
ISI Accession Number ID |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Zong, W | - |
dc.contributor.author | Wu, F | - |
dc.contributor.author | Chu, LK | - |
dc.contributor.author | Sculli, D | - |
dc.date.accessioned | 2015-07-21T02:10:33Z | - |
dc.date.available | 2015-07-21T02:10:33Z | - |
dc.date.issued | 2017 | - |
dc.identifier.citation | Enterprise Information Systems, 2017, v. 11 n. 3, p. 434-451 | - |
dc.identifier.issn | 1751-7575 | - |
dc.identifier.uri | http://hdl.handle.net/10722/211779 | - |
dc.description.abstract | The quality of master data is crucial for the accurate functioning of the various modules of an enterprise resource planning (ERP) system. This study addresses specific data problems arising from the generation of approximately duplicate material records in ERP databases. Such problems are mainly due to the firm’s lack of unique and global identifiers for the material records, and to the arbitrary assignment of alternative names for the same material by various users. Traditional duplicate detection methods are ineffective in identifying such approximately duplicate material records because these methods typically rely on string comparisons of each field. To address this problem, a machine learning-based framework is developed to recognise semantic similarity between strings and to further identify and reunify approximately duplicate material records – a process referred to as de-duplication in this article. First, the keywords of the material records are extracted to form vectors of discriminating words. Second, a machine learning method using a probabilistic neural network is applied to determine the semantic similarity between these material records. The approach was evaluated using data from a real case study. The test results indicate that the proposed method outperforms traditional algorithms in identifying approximately duplicate material records. | - |
dc.language | eng | - |
dc.relation.ispartof | Enterprise Information Systems | - |
dc.subject | approximately duplicate material records | - |
dc.subject | data quality | - |
dc.subject | enterprise resource planning (ERP) systems | - |
dc.subject | probabilistic neural network (PNN) | - |
dc.subject | records de-duplication | - |
dc.title | Identification of approximately duplicate material records in ERP systems | - |
dc.type | Article | - |
dc.identifier.email | Chu, LK: lkchu@hkucc.hku.hk | - |
dc.identifier.email | Sculli, D: hreidsc@hkucc.hku.hk | - |
dc.identifier.authority | Chu, LK=rp00113 | - |
dc.description.nature | link_to_subscribed_fulltext | - |
dc.identifier.doi | 10.1080/17517575.2015.1065513 | - |
dc.identifier.scopus | eid_2-s2.0-84936972848 | - |
dc.identifier.hkuros | 245672 | - |
dc.identifier.volume | 11 | - |
dc.identifier.issue | 3 | - |
dc.identifier.spage | 434 | - |
dc.identifier.epage | 451 | - |
dc.identifier.eissn | 1751-7583 | - |
dc.identifier.isi | WOS:000392601700006 | - |
dc.identifier.issnl | 1751-7575 | - |