TY - GEN
T1 - Using Taxonomy Tree to Generalize a Fuzzy Thematic Cluster
AU - Frolov, Dmitry
AU - Nascimento, Susana
AU - Fenner, Trevor
AU - Mirkin, Boris
N1 - D.F. and B.M. acknowledge continuing support by the Academic Fund Program at the National Research University Higher School of Economics (grant 19-04-019 in 2018-2019) and by the International Decision Choice and Analysis Laboratory (DECAN) NRU HSE, in the framework of a subsidy granted to the HSE by the Government of the Russian Federation for the implementation of the the Russian Academic Excellence Project “5-100”. S.N. acknowledges the support by FCT/MCTES, NOVA LINCS (UID/CEC/04516/2019).
PY - 2019/6
Y1 - 2019/6
N2 - This paper presents an algorithm, ParGenFS, for generalizing, or 'lifting', a fuzzy set of topics to higher ranks of a hierarchical taxonomy of a research domain. The algorithm ParGenFS finds a globally optimal generalization of the topic set to minimize a penalty function, by balancing the number of introduced 'head subjects' and related errors, the 'gaps' and 'offshoots', differently weighted. This leads to a generalization of the topic set in the taxonomy. The usefulness of the method is illustrated on a set of 17685 abstracts of research papers on Data Science published in Springer journals for the past 20 years. We extracted a taxonomy of Data Science from the international Association for Computing Machinery Computing Classification System 2012 (ACM-CCS). We find fuzzy clusters of leaf topics over the text collection, lift them in the taxonomy, and interpret found head subjects to comment on the tendencies of current research.
AB - This paper presents an algorithm, ParGenFS, for generalizing, or 'lifting', a fuzzy set of topics to higher ranks of a hierarchical taxonomy of a research domain. The algorithm ParGenFS finds a globally optimal generalization of the topic set to minimize a penalty function, by balancing the number of introduced 'head subjects' and related errors, the 'gaps' and 'offshoots', differently weighted. This leads to a generalization of the topic set in the taxonomy. The usefulness of the method is illustrated on a set of 17685 abstracts of research papers on Data Science published in Springer journals for the past 20 years. We extracted a taxonomy of Data Science from the international Association for Computing Machinery Computing Classification System 2012 (ACM-CCS). We find fuzzy clusters of leaf topics over the text collection, lift them in the taxonomy, and interpret found head subjects to comment on the tendencies of current research.
KW - annotated suffix tree
KW - fuzzy cluster
KW - gap-offshoot penalty
KW - generalization
UR - http://www.scopus.com/inward/record.url?scp=85073786103&partnerID=8YFLogxK
U2 - 10.1109/FUZZ-IEEE.2019.8859015
DO - 10.1109/FUZZ-IEEE.2019.8859015
M3 - Conference contribution
AN - SCOPUS:85073786103
T3 - IEEE International Conference on Fuzzy Systems
BT - 2019 IEEE International Conference on Fuzzy Systems, FUZZ 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2019 IEEE International Conference on Fuzzy Systems, FUZZ 2019
Y2 - 23 June 2019 through 26 June 2019
ER -