Abstract
A statistics-based approach for clustering documentsand for extracting cluster topics is described. Relevant(meaningful} Expressions (RES} automatically extractedfrom corpora are used as clustering base features. Thesefeatures are transformed and its number is strongly reducedin order to obtain a small set of document classijication features. This is achieved on the basis of Principal Components Analysis. Model-Based Clustering Analysis finds thebest number of clusters. Then, the most important RES areextracted from each cluster and taken as document clustertopics.
Original language | English |
---|---|
Title of host publication | Proceedings of the 2001 IEEE International Conference on Data Mining, ICDM 2001 |
Editors | N. Cercone, T. Y. Lin, X. Wu |
Place of Publication | Los Alamitos, California |
Publisher | IEEE Computer Society |
Pages | 513-520 |
Number of pages | 8 |
ISBN (Print) | 978-0-7695-1119-8 |
Publication status | Published - 1 Jan 2001 |
Event | 2001 IEEE International Conference on Data Mining - San Jose, United States Duration: 29 Nov 2001 → 2 Dec 2001 |
Conference
Conference | 2001 IEEE International Conference on Data Mining |
---|---|
Country/Territory | United States |
City | San Jose |
Period | 29/11/01 → 2/12/01 |
Keywords
- Information retrieval systems
- Principal component analysis
- Data mining