File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Article: Efficient management of uncertainty in XML schema matching

TitleEfficient management of uncertainty in XML schema matching
Authors
KeywordsData Integration
Schema Matching
Uncertainty
Xml
Issue Date2012
PublisherSpringer Verlag. The Journal's web site is located at http://link.springer.de/link/service/journals/00778/index.htm
Citation
VLDB Journal, 2012, v. 21 n. 3, p. 385-409 How to Cite?
AbstractDespite advances in machine learning technologies a schema matching result between two database schemas (e. g., those derived from COMA++) is likely to be imprecise. In particular, numerous instances of "possible mappings" between the schemas may be derived from the matching result. In this paper, we study problems related to managing possible mappings between two heterogeneous XML schemas. First, we study how to efficiently generate possible mappings for a given schema matching task. While this problem can be solved by existing algorithms, we show how to improve the performance of the solution by using a divide-and-conquer approach. Second, storing and querying a large set of possible mappings can incur large storage and evaluation overhead. For XML schemas, we observe that their possible mappings often exhibit a high degree of overlap. We hence propose a novel data structure, called the block tree, to capture the commonalities among possible mappings. The block tree is useful for representing the possible mappings in a compact manner and can be efficiently generated. Moreover, it facilitates the evaluation of a probabilistic twig query (PTQ), which returns the non-zero probability that a fragment of an XML document matches a given query. For users who are interested only in answers with k-highest probabilities, we also propose the top-k PTQ and present an efficient solution for it. An extensive evaluation on real-world data sets shows that our approaches significantly improve the efficiency of generating, storing, and querying possible mappings. © 2011 Springer-Verlag.
Persistent Identifierhttp://hdl.handle.net/10722/152502
ISSN
2021 Impact Factor: 4.243
2020 SCImago Journal Rankings: 0.653
ISI Accession Number ID
Funding AgencyGrant Number
Research Grants Council of Hong Kong (GRF)711309E
Funding Information:

"This work was supported by the Research Grants Council of Hong Kong (GRF Project 711309E). We would like to thank the anonymous reviewers for their insightful comments".

References

 

DC FieldValueLanguage
dc.contributor.authorGong, Jen_US
dc.contributor.authorCheng, Ren_US
dc.contributor.authorCheung, DWen_US
dc.date.accessioned2012-06-26T06:39:44Z-
dc.date.available2012-06-26T06:39:44Z-
dc.date.issued2012en_US
dc.identifier.citationVLDB Journal, 2012, v. 21 n. 3, p. 385-409en_US
dc.identifier.issn1066-8888en_US
dc.identifier.urihttp://hdl.handle.net/10722/152502-
dc.description.abstractDespite advances in machine learning technologies a schema matching result between two database schemas (e. g., those derived from COMA++) is likely to be imprecise. In particular, numerous instances of "possible mappings" between the schemas may be derived from the matching result. In this paper, we study problems related to managing possible mappings between two heterogeneous XML schemas. First, we study how to efficiently generate possible mappings for a given schema matching task. While this problem can be solved by existing algorithms, we show how to improve the performance of the solution by using a divide-and-conquer approach. Second, storing and querying a large set of possible mappings can incur large storage and evaluation overhead. For XML schemas, we observe that their possible mappings often exhibit a high degree of overlap. We hence propose a novel data structure, called the block tree, to capture the commonalities among possible mappings. The block tree is useful for representing the possible mappings in a compact manner and can be efficiently generated. Moreover, it facilitates the evaluation of a probabilistic twig query (PTQ), which returns the non-zero probability that a fragment of an XML document matches a given query. For users who are interested only in answers with k-highest probabilities, we also propose the top-k PTQ and present an efficient solution for it. An extensive evaluation on real-world data sets shows that our approaches significantly improve the efficiency of generating, storing, and querying possible mappings. © 2011 Springer-Verlag.en_US
dc.languageengen_US
dc.publisherSpringer Verlag. The Journal's web site is located at http://link.springer.de/link/service/journals/00778/index.htmen_US
dc.relation.ispartofVLDB Journalen_US
dc.subjectData Integrationen_US
dc.subjectSchema Matchingen_US
dc.subjectUncertaintyen_US
dc.subjectXmlen_US
dc.titleEfficient management of uncertainty in XML schema matchingen_US
dc.typeArticleen_US
dc.identifier.emailCheng, R:ckcheng@cs.hku.hken_US
dc.identifier.emailCheung, DW:dcheung@cs.hku.hken_US
dc.identifier.authorityCheng, R=rp00074en_US
dc.identifier.authorityCheung, DW=rp00101en_US
dc.description.naturelink_to_subscribed_fulltexten_US
dc.identifier.doi10.1007/s00778-011-0248-4en_US
dc.identifier.scopuseid_2-s2.0-84861188683en_US
dc.identifier.hkuros190770-
dc.relation.referenceshttp://www.scopus.com/mlt/select.url?eid=2-s2.0-84861188683&selection=ref&src=s&origin=recordpageen_US
dc.identifier.volume21en_US
dc.identifier.issue3en_US
dc.identifier.spage385en_US
dc.identifier.epage409en_US
dc.identifier.eissn0949-877X-
dc.identifier.isiWOS:000304145500005-
dc.publisher.placeGermanyen_US
dc.identifier.scopusauthoridGong, J=47961908400en_US
dc.identifier.scopusauthoridCheng, R=7201955416en_US
dc.identifier.scopusauthoridCheung, DW=34567902600en_US
dc.identifier.citeulike9742107-
dc.identifier.issnl1066-8888-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats