File Download
There are no files associated with this item.
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1145/1281192.1281276
- Scopus: eid_2-s2.0-36849036336
Supplementary
-
Citations:
- Scopus: 0
- Appears in Collections:
Conference Paper: Mining correlated bursty topic patterns from coordinated text streams
Title | Mining correlated bursty topic patterns from coordinated text streams |
---|---|
Authors | |
Keywords | Clustering Coordinated streams Correlated bursty patterns Reinforcement Data sets Probabilistic algorithms Text mining Clustering algorithms Correlation methods Database systems Probabilistic logics Problem solving Set theory Data mining |
Issue Date | 2007 |
Citation | The 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, CA., 12-15 August 2007. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2007, p. 784-793 How to Cite? |
Abstract | Previous work on text mining has almost exclusively focused on a single stream. However, we often have available multiple text streams indexed by the same set of time points (called coordinated text streams), which offer new opportunities for text mining. For example, when a major event happens, all the news articles published by different agencies in different languages tend to cover the same event for a certain period, exhibiting a correlated bursty topic pattern in all the news article streams. In general, mining correlated bursty topic patterns from coordinated text streams can reveal interesting latent associations or events behind these streams. In this paper, we define and study this novel text mining problem. We propose a general probabilistic algorithm which can effectively discover correlated bursty patterns and their bursty periods across text streams even if the streams have completely different vocabularies (e.g., English vs Chinese). Evaluation of the proposed method on a news data set and a literature data set shows that it can effectively discover quite meaningful topic patterns from both data sets: the patterns discovered from the news data set accurately reveal the major common events covered in the two streams of news articles (in English and Chinese, respectively), while the patterns discovered from two database publication streams match well with the major research paradigm shifts in database research. Since the proposed method is general and does not require the streams to share vocabulary, it can be applied to any coordinated text streams to discover correlated topic patterns that burst in multiple streams in the same period. © 2007 ACM. |
Persistent Identifier | http://hdl.handle.net/10722/180712 |
ISBN |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Wang, X | en_US |
dc.contributor.author | Zhai, C | en_US |
dc.contributor.author | Hu, X | en_US |
dc.contributor.author | Sproat, R | en_US |
dc.date.accessioned | 2013-01-28T01:41:33Z | - |
dc.date.available | 2013-01-28T01:41:33Z | - |
dc.date.issued | 2007 | en_US |
dc.identifier.citation | The 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, CA., 12-15 August 2007. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2007, p. 784-793 | en_US |
dc.identifier.isbn | 1595936092 | en_US |
dc.identifier.isbn | 9781595936097 | en_US |
dc.identifier.uri | http://hdl.handle.net/10722/180712 | - |
dc.description.abstract | Previous work on text mining has almost exclusively focused on a single stream. However, we often have available multiple text streams indexed by the same set of time points (called coordinated text streams), which offer new opportunities for text mining. For example, when a major event happens, all the news articles published by different agencies in different languages tend to cover the same event for a certain period, exhibiting a correlated bursty topic pattern in all the news article streams. In general, mining correlated bursty topic patterns from coordinated text streams can reveal interesting latent associations or events behind these streams. In this paper, we define and study this novel text mining problem. We propose a general probabilistic algorithm which can effectively discover correlated bursty patterns and their bursty periods across text streams even if the streams have completely different vocabularies (e.g., English vs Chinese). Evaluation of the proposed method on a news data set and a literature data set shows that it can effectively discover quite meaningful topic patterns from both data sets: the patterns discovered from the news data set accurately reveal the major common events covered in the two streams of news articles (in English and Chinese, respectively), while the patterns discovered from two database publication streams match well with the major research paradigm shifts in database research. Since the proposed method is general and does not require the streams to share vocabulary, it can be applied to any coordinated text streams to discover correlated topic patterns that burst in multiple streams in the same period. © 2007 ACM. | en_US |
dc.language | eng | en_US |
dc.relation.ispartof | Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining | - |
dc.subject | Clustering | en_US |
dc.subject | Coordinated streams | en_US |
dc.subject | Correlated bursty patterns | en_US |
dc.subject | Reinforcement | en_US |
dc.subject | Data sets | en_US |
dc.subject | Probabilistic algorithms | en_US |
dc.subject | Text mining | en_US |
dc.subject | Clustering algorithms | en_US |
dc.subject | Correlation methods | en_US |
dc.subject | Database systems | en_US |
dc.subject | Probabilistic logics | en_US |
dc.subject | Problem solving | en_US |
dc.subject | Set theory | en_US |
dc.subject | Data mining | en_US |
dc.title | Mining correlated bursty topic patterns from coordinated text streams | en_US |
dc.type | Conference_Paper | en_US |
dc.identifier.email | Hu, X: xiaoxhu@hku.hk | en_US |
dc.identifier.authority | Hu, X=rp01711 | en_US |
dc.description.nature | link_to_subscribed_fulltext | en_US |
dc.identifier.doi | 10.1145/1281192.1281276 | en_US |
dc.identifier.scopus | eid_2-s2.0-36849036336 | - |
dc.identifier.spage | 784 | en_US |
dc.identifier.epage | 793 | en_US |
dc.customcontrol.immutable | sml 160129 - amend | - |