Document clustering and cluster topic extraction in multilingual corpora

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

A statistics-based approach for clustering documentsand for extracting cluster topics is described. Relevant(meaningful} Expressions (RES} automatically extractedfrom corpora are used as clustering base features. Thesefeatures are transformed and its number is strongly reducedin order to obtain a small set of document classijication features. This is achieved on the basis of Principal Components Analysis. Model-Based Clustering Analysis finds thebest number of clusters. Then, the most important RES areextracted from each cluster and taken as document clustertopics.
Original languageEnglish
Title of host publicationProceedings of the 2001 IEEE International Conference on Data Mining, ICDM 2001
EditorsN. Cercone, T. Y. Lin, X. Wu
Place of PublicationLos Alamitos, California
PublisherIEEE Computer Society
Pages513-520
Number of pages8
ISBN (Print)978-0-7695-1119-8
Publication statusPublished - 1 Jan 2001
Event2001 IEEE International Conference on Data Mining - San Jose, United States
Duration: 29 Nov 20012 Dec 2001

Conference

Conference2001 IEEE International Conference on Data Mining
CountryUnited States
CitySan Jose
Period29/11/012/12/01

Keywords

  • Information retrieval systems
  • Principal component analysis
  • Data mining

Fingerprint Dive into the research topics of 'Document clustering and cluster topic extraction in multilingual corpora'. Together they form a unique fingerprint.

Cite this