File Download
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.14778/2809974.2809976
- Scopus: eid_2-s2.0-84953875174
- WOS: WOS:000362283300002
- Find via
Supplementary
- Citations:
- Appears in Collections:
Conference Paper: Scaling Similarity Joins over Tree-Structured Data
Title | Scaling Similarity Joins over Tree-Structured Data |
---|---|
Authors | |
Issue Date | 2015 |
Publisher | Very Large Data Base (VLDB) Endowment Inc. The Journal's web site is located at http://vldb.org/pvldb/index.html |
Citation | Proceedings of the 41st International Conference on Very Large Data Bases, Kohala Coast, Hawaii, 31 August-4th September 2015. In Proceedings of the VLDB Endowment, 2015, v. 8 n. 11, p. 1130-1141 How to Cite? |
Abstract | Given a large collection of tree-structured objects (e.g., XML documents), the similarity join finds the pairs of objects that are similar to each other, based on a similarity threshold and a tree edit distance measure. The state-of-the-art similarity join methods compare simpler approximations of the objects (e.g., strings), in order to prune pairs that cannot be part of the similarity join result based on distance bounds derived by the approximations. In this paper, we propose a novel similarity join approach, which is based on the dynamic decomposition of the tree objects into subgraphs, according to the similarity threshold. Our technique avoids computing the exact distance between two tree objects, if the objects do not share at least one common subgraph. In order to scale up the join, the computed subgraphs are managed in a two-layer index. Our experimental results on real and synthetic data collections show that our approach outperforms the state-of-the-art methods by up to an order of magnitude. |
Persistent Identifier | http://hdl.handle.net/10722/213720 |
ISSN | 2023 Impact Factor: 2.6 2023 SCImago Journal Rankings: 2.666 |
ISI Accession Number ID |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Tang, Y | - |
dc.contributor.author | Cai, Y | - |
dc.contributor.author | Mamoulis, N | - |
dc.date.accessioned | 2015-08-13T01:43:40Z | - |
dc.date.available | 2015-08-13T01:43:40Z | - |
dc.date.issued | 2015 | - |
dc.identifier.citation | Proceedings of the 41st International Conference on Very Large Data Bases, Kohala Coast, Hawaii, 31 August-4th September 2015. In Proceedings of the VLDB Endowment, 2015, v. 8 n. 11, p. 1130-1141 | - |
dc.identifier.issn | 2150-8097 | - |
dc.identifier.uri | http://hdl.handle.net/10722/213720 | - |
dc.description.abstract | Given a large collection of tree-structured objects (e.g., XML documents), the similarity join finds the pairs of objects that are similar to each other, based on a similarity threshold and a tree edit distance measure. The state-of-the-art similarity join methods compare simpler approximations of the objects (e.g., strings), in order to prune pairs that cannot be part of the similarity join result based on distance bounds derived by the approximations. In this paper, we propose a novel similarity join approach, which is based on the dynamic decomposition of the tree objects into subgraphs, according to the similarity threshold. Our technique avoids computing the exact distance between two tree objects, if the objects do not share at least one common subgraph. In order to scale up the join, the computed subgraphs are managed in a two-layer index. Our experimental results on real and synthetic data collections show that our approach outperforms the state-of-the-art methods by up to an order of magnitude. | - |
dc.language | eng | - |
dc.publisher | Very Large Data Base (VLDB) Endowment Inc. The Journal's web site is located at http://vldb.org/pvldb/index.html | - |
dc.relation.ispartof | Proceedings of the VLDB Endowment | - |
dc.title | Scaling Similarity Joins over Tree-Structured Data | - |
dc.type | Conference_Paper | - |
dc.identifier.email | Tang, Y: ytang@cs.hku.hk | - |
dc.identifier.email | Cai, Y: ylcai@cs.hku.hk | - |
dc.identifier.email | Mamoulis, N: nikos@cs.hku.hk | - |
dc.identifier.authority | Mamoulis, N=rp00155 | - |
dc.description.nature | published_or_final_version | - |
dc.identifier.doi | 10.14778/2809974.2809976 | - |
dc.identifier.scopus | eid_2-s2.0-84953875174 | - |
dc.identifier.hkuros | 246267 | - |
dc.identifier.volume | 8 | - |
dc.identifier.issue | 11 | - |
dc.identifier.spage | 1130 | - |
dc.identifier.epage | 1141 | - |
dc.identifier.isi | WOS:000362283300002 | - |
dc.publisher.place | United States | - |
dc.identifier.issnl | 2150-8097 | - |