A Hybrid Approach to the Analysis of a Collection of Research Papers

Boris Mirkin, Dmitry Frolov, Alex Vlasov, Susana Nascimento, Trevor Fenner

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)

Abstract

We define and find a most specific generalization of a fuzzy set of topics assigned to leaves of the rooted tree of a taxonomy. This generalization lifts the set to a “head subject” in the higher ranks of the taxonomy, that is supposed to “tightly” cover the query set, possibly bringing in some errors, both “gaps” and “offshoots”. Our hybrid method involves two more automated analysis techniques: a fuzzy clustering method, FADDIS, involving both additive and spectral properties, and a purely structural string-to-text relevance measure based on suffix trees annotated by frequencies. We apply this to extract research tendencies from two collections of research papers: (a) about 18000 research papers published in Springer journals on data science for 20 years, and (b) about 27000 research papers retrieved from Springer and Elsevier journals in response to data science related queries. We consider a taxonomy of Data Science based on the Association for Computing Machinery Classification of Computing System (ACM-CCS 2012). Our findings allow us to make some comments on the tendencies of research that cannot be derived by using more conventional techniques.

Original languageEnglish
Title of host publicationIntelligent Data Engineering and Automated Learning – IDEAL 2020 - 21st International Conference, 2020, Proceedings
EditorsCesar Analide, Paulo Novais, David Camacho, Hujun Yin
Place of PublicationCham
PublisherSpringer
Pages423-433
Number of pages11
ISBN (Electronic)978-3-030-62365-4
ISBN (Print)978-3-030-62364-7
DOIs
Publication statusPublished - 2020
Event21th International Conference on Intelligent Data Engineering and Automated Learning, IDEAL 2020 - Guimaraes, Portugal
Duration: 4 Nov 20206 Nov 2020

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
PublisherSpringer
Volume12490 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference21th International Conference on Intelligent Data Engineering and Automated Learning, IDEAL 2020
Country/TerritoryPortugal
CityGuimaraes
Period4/11/206/11/20

Keywords

  • Annotated Suffix Tree
  • Fuzzy cluster
  • Generalization
  • Hybrid approach
  • Research tendency

Fingerprint

Dive into the research topics of 'A Hybrid Approach to the Analysis of a Collection of Research Papers'. Together they form a unique fingerprint.

Cite this