File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: Dictionary matching with uneven gaps

TitleDictionary matching with uneven gaps
Authors
KeywordsDictionary matching
Point enclosure queries
Issue Date2015
PublisherSpringer Verlag. The Journal's web site is located at http://springerlink.com/content/105633/
Citation
The 26th Annual Symposium on Combinatorial Pattern Matching (CPM 2015), Ischia Island, Italy, 29 June-1 July 2015. In Lecture Notes in Computer Science, 2015, v. 9133, p. 247-260 How to Cite?
AbstractA gap-pattern is a sequence of sub-patterns separated by bounded sequences of don’t care characters (called gaps). A one-gap-pattern is a pattern of the form P[α,β]Q , where P and Q are strings drawn from alphabet Σ and [α,β] are lower and upper bounds on the gap size g . The gap size g is the number of don’t care characters between P and Q . The dictionary matching problem with one-gap is to index a collection of one-gap-patterns, so as to identify all sub-strings of a query text T that match with any one-gap-pattern in the collection. Let D be such a collection of d patterns, where D={P i [α i ,β i ]Q i ∣1≤i≤d} . Let n=∑ d i=1 |P i |+|Q i | . Let γ and λ be two parameters defined on D as follows: γ=|{j∣j∈[α i ,β i ],1≤i≤d}| and λ=|{α i ,β i ∣1≤i≤d}| . Specifically γ is the total number gap lengths possible over all patterns in D and λ is the number of distinct gap boundaries across all the patterns. We present a linear space solution (i.e., O(n) words) for answering a dictionary matching query on D in time O(|T|γlogλlogd+occ) , where occ is the output size. The query time can be improved to O(|T|γ+occ) using O(n+d 1+ϵ ) space, where ϵ>0 is an arbitrarily small constant. Additionally, we show a compact/succinct space index offering a space-time trade-off. In the special case where parameters α i and β i ’s for all the patterns are same, our results improve upon the work by Amir et al. [CPM, 2014]. We also explore several related cases where gaps can occur at arbitrary locations and where gap can be induced in the text rather than pattern.
DescriptionLNCS v. 9133 entitled: Combinatorial Pattern Matching: 26th Annual Symposium, CPM 2015 ... Proceedings
Persistent Identifierhttp://hdl.handle.net/10722/214759
ISBN
ISSN
2020 SCImago Journal Rankings: 0.249

 

DC FieldValueLanguage
dc.contributor.authorHon, WK-
dc.contributor.authorLam, TW-
dc.contributor.authorShah, R-
dc.contributor.authorThankachan, SV-
dc.contributor.authorTing, HF-
dc.contributor.authorYang, Y-
dc.date.accessioned2015-08-21T11:54:24Z-
dc.date.available2015-08-21T11:54:24Z-
dc.date.issued2015-
dc.identifier.citationThe 26th Annual Symposium on Combinatorial Pattern Matching (CPM 2015), Ischia Island, Italy, 29 June-1 July 2015. In Lecture Notes in Computer Science, 2015, v. 9133, p. 247-260-
dc.identifier.isbn978-3-319-19928-3-
dc.identifier.issn0302-9743-
dc.identifier.urihttp://hdl.handle.net/10722/214759-
dc.descriptionLNCS v. 9133 entitled: Combinatorial Pattern Matching: 26th Annual Symposium, CPM 2015 ... Proceedings-
dc.description.abstractA gap-pattern is a sequence of sub-patterns separated by bounded sequences of don’t care characters (called gaps). A one-gap-pattern is a pattern of the form P[α,β]Q , where P and Q are strings drawn from alphabet Σ and [α,β] are lower and upper bounds on the gap size g . The gap size g is the number of don’t care characters between P and Q . The dictionary matching problem with one-gap is to index a collection of one-gap-patterns, so as to identify all sub-strings of a query text T that match with any one-gap-pattern in the collection. Let D be such a collection of d patterns, where D={P i [α i ,β i ]Q i ∣1≤i≤d} . Let n=∑ d i=1 |P i |+|Q i | . Let γ and λ be two parameters defined on D as follows: γ=|{j∣j∈[α i ,β i ],1≤i≤d}| and λ=|{α i ,β i ∣1≤i≤d}| . Specifically γ is the total number gap lengths possible over all patterns in D and λ is the number of distinct gap boundaries across all the patterns. We present a linear space solution (i.e., O(n) words) for answering a dictionary matching query on D in time O(|T|γlogλlogd+occ) , where occ is the output size. The query time can be improved to O(|T|γ+occ) using O(n+d 1+ϵ ) space, where ϵ>0 is an arbitrarily small constant. Additionally, we show a compact/succinct space index offering a space-time trade-off. In the special case where parameters α i and β i ’s for all the patterns are same, our results improve upon the work by Amir et al. [CPM, 2014]. We also explore several related cases where gaps can occur at arbitrary locations and where gap can be induced in the text rather than pattern.-
dc.languageeng-
dc.publisherSpringer Verlag. The Journal's web site is located at http://springerlink.com/content/105633/-
dc.relation.ispartofLecture Notes in Computer Science-
dc.rightsThe final publication is available at Springer via http://dx.doi.org/[insert DOI]-
dc.subjectDictionary matching-
dc.subjectPoint enclosure queries-
dc.titleDictionary matching with uneven gaps-
dc.typeConference_Paper-
dc.identifier.emailLam, TW: twlam@cs.hku.hk-
dc.identifier.emailTing, HF: hfting@cs.hku.hk-
dc.identifier.authorityLam, TW=rp00135-
dc.identifier.authorityTing, HF=rp00177-
dc.identifier.doi10.1007/978-3-319-19929-0_21-
dc.identifier.scopuseid_2-s2.0-84949008915-
dc.identifier.hkuros249628-
dc.identifier.volume9133-
dc.identifier.spage247-
dc.identifier.epage260-
dc.publisher.placeGermany-
dc.customcontrol.immutablesml 150929-
dc.identifier.issnl0302-9743-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats