How to detect a small cluster in big data?

Paulo João, Victor Lobo

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Downloads (Pure)


Detecting small clusters in a large amount of data is a difficult problem, mainly when there are only a few samples to be detected. There are general purpose solutions for small cluster detection, but many times they are not adequate for specific data. Artificial Intelligence techniques have been proposed, because they present the advantage of requiring little or no a priori assumption on the data distributions. The amount and higher dimensional nature of big data makes it too complex to be processed and analyzed by traditional methods. Hierarchical Self Organizing Maps, (HSOM) can improve the decision making with an approach based on specialization of Self Organizing Maps (SOM), dimensionality reduction and visualization of clusters. The goal is to propose a methodology to detect and visualize small clusters in the data with a toy case, where traditional human based approaches are not possible or are too complex to process, and the results clearly demonstrate that the HSOM based method outperforms the most widely adopted traditional methods revealing a number of small clusters hidden in data.

Original languageEnglish
Title of host publicationAtas da 14ª Conferência da Associação Portuguesa de Sistemas de Informação
Subtitle of host publicationOs Sistemas de Informação na Saúde
PublisherFundação Luis de Molina
Number of pages12
ISBN (Print)978-989-8132-13-0
Publication statusPublished - 1 Jan 2014
Event14th Portuguese Association for Information Systems Conference, CAPSI 2014 - Evora, Portugal
Duration: 3 Oct 20144 Oct 2014

Publication series

NameAtas da Conferencia da Associacao Portuguesa de Sistemas de Informacao


Conference14th Portuguese Association for Information Systems Conference, CAPSI 2014


  • Big data
  • Cluster
  • Data mining
  • HSOM
  • Outlier detection
  • SOM


Dive into the research topics of 'How to detect a small cluster in big data?'. Together they form a unique fingerprint.

Cite this