Automatic Extraction of Document Topics

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Citations (Scopus)

Abstract

A keyword or topic for a document is a word or multi-word (sequence of 2 or more words) that summarizes in itself part of that document content. In this paper we compare several statistics-based language independent methodologies to automatically extract keywords. We rank words, multi-words, and word prefixes (with fixed length: 5 characters), by using several similarity measures (some widely known and some newly coined) and evaluate the results obtained as well as the agreement between evaluators. Portuguese, English and Czech were the languages experimented.
Original languageUnknown
Title of host publicationIFIP Advances in Information and Communication Technology
EditorsLM Camarinha-Matos
PublisherIFIP ( Austria )
Pages101-108
ISBN (Print)978-3-642-19170-1
DOIs
Publication statusPublished - 1 Jan 2011
EventDoCEIS’11 – 2nd Edition of the Doctoral Conference on Computing, Electrical and Industrial Systems -
Duration: 1 Jan 2010 → …

Conference

ConferenceDoCEIS’11 – 2nd Edition of the Doctoral Conference on Computing, Electrical and Industrial Systems
Period1/01/10 → …

Cite this