File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Article: Discovering and reconciling value conflicts for numerical data integration

TitleDiscovering and reconciling value conflicts for numerical data integration
Authors
KeywordsConversion function
Data integration
Data mining
Data quality
Robust regression
Semantic conflicts
Issue Date2001
PublisherPergamon. The Journal's web site is located at http://www.elsevier.com/locate/is
Citation
Information Systems, 2001, v. 26 n. 8, p. 635-656 How to Cite?
AbstractThe built-up in Information Technology capital fueled by the Internet and cost-effectiveness of new telecommunications technologies has led to a proliferation of information systems that are in dire need to exchange information but incapable of doing so due to the lack of semantic interoperability. It is now evident that physical connectivity (the ability to exchange bits and bytes) is no longer adequate: the integration of data from autonomous and heterogeneous systems calls for the prior identification and resolution of semantic conflicts that may be present. Unfortunately, this requires the system integrator to sift through the data from disparate systems in a painstaking manner. We suggest that this process can be partially automated by presenting a methodology and technique for the discovery of potential semantic conflicts as well as the underlying data transformation needed to resolve the conflicts. Our methodology begins by classifying data value conflicts into two categories: context independent and context dependent. While context independent conflicts are usually caused by unexpected errors, the context dependent conflicts are primarily a result of the heterogeneity of underlying data sources. To facilitate data integration, data value conversion rules are proposed to describe the quantitative relationships among data values involving context dependent conflicts. A general approach is proposed to discover data value conversion rules from the data. The approach consists of the five major steps: relevant attribute analysis, candidate model selection, conversion function generation, conversion function selection and conversion rule formation. It is being implemented in a prototype system, DIRECT, for business data using statistics based techniques. Preliminary study using both synthetic and real world data indicated that the proposed approach is promising.
Persistent Identifierhttp://hdl.handle.net/10722/88983
ISSN
2021 Impact Factor: 3.180
2020 SCImago Journal Rankings: 0.547
ISI Accession Number ID
References

 

DC FieldValueLanguage
dc.contributor.authorFan, Wen_HK
dc.contributor.authorLu, Hen_HK
dc.contributor.authorMadnick, SEen_HK
dc.contributor.authorCheung, Den_HK
dc.date.accessioned2010-09-06T09:50:55Z-
dc.date.available2010-09-06T09:50:55Z-
dc.date.issued2001en_HK
dc.identifier.citationInformation Systems, 2001, v. 26 n. 8, p. 635-656en_HK
dc.identifier.issn0306-4379en_HK
dc.identifier.urihttp://hdl.handle.net/10722/88983-
dc.description.abstractThe built-up in Information Technology capital fueled by the Internet and cost-effectiveness of new telecommunications technologies has led to a proliferation of information systems that are in dire need to exchange information but incapable of doing so due to the lack of semantic interoperability. It is now evident that physical connectivity (the ability to exchange bits and bytes) is no longer adequate: the integration of data from autonomous and heterogeneous systems calls for the prior identification and resolution of semantic conflicts that may be present. Unfortunately, this requires the system integrator to sift through the data from disparate systems in a painstaking manner. We suggest that this process can be partially automated by presenting a methodology and technique for the discovery of potential semantic conflicts as well as the underlying data transformation needed to resolve the conflicts. Our methodology begins by classifying data value conflicts into two categories: context independent and context dependent. While context independent conflicts are usually caused by unexpected errors, the context dependent conflicts are primarily a result of the heterogeneity of underlying data sources. To facilitate data integration, data value conversion rules are proposed to describe the quantitative relationships among data values involving context dependent conflicts. A general approach is proposed to discover data value conversion rules from the data. The approach consists of the five major steps: relevant attribute analysis, candidate model selection, conversion function generation, conversion function selection and conversion rule formation. It is being implemented in a prototype system, DIRECT, for business data using statistics based techniques. Preliminary study using both synthetic and real world data indicated that the proposed approach is promising.en_HK
dc.languageengen_HK
dc.publisherPergamon. The Journal's web site is located at http://www.elsevier.com/locate/isen_HK
dc.relation.ispartofInformation Systemsen_HK
dc.subjectConversion functionen_HK
dc.subjectData integrationen_HK
dc.subjectData miningen_HK
dc.subjectData qualityen_HK
dc.subjectRobust regressionen_HK
dc.subjectSemantic conflictsen_HK
dc.titleDiscovering and reconciling value conflicts for numerical data integrationen_HK
dc.typeArticleen_HK
dc.identifier.openurlhttp://library.hku.hk:4550/resserv?sid=HKU:IR&issn=0306-4379&volume=9&spage=635&epage=656&date=2001&atitle=Discovering+and+Reconciling+Value+Conflicts+for+Numerical+Data+Integrationen_HK
dc.identifier.emailCheung, D:dcheung@cs.hku.hken_HK
dc.identifier.authorityCheung, D=rp00101en_HK
dc.description.naturelink_to_subscribed_fulltext-
dc.identifier.doi10.1016/S0306-4379(01)00043-6en_HK
dc.identifier.scopuseid_2-s2.0-0035545935en_HK
dc.identifier.hkuros70939en_HK
dc.relation.referenceshttp://www.scopus.com/mlt/select.url?eid=2-s2.0-0035545935&selection=ref&src=s&origin=recordpageen_HK
dc.identifier.volume26en_HK
dc.identifier.issue8en_HK
dc.identifier.spage635en_HK
dc.identifier.epage656en_HK
dc.identifier.isiWOS:000172158400006-
dc.publisher.placeUnited Kingdomen_HK
dc.identifier.scopusauthoridFan, W=7401635358en_HK
dc.identifier.scopusauthoridLu, H=7404843983en_HK
dc.identifier.scopusauthoridMadnick, SE=7003477810en_HK
dc.identifier.scopusauthoridCheung, D=34567902600en_HK
dc.identifier.issnl0306-4379-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats