Toward a token-based approach to concern detection in MATLAB sources

Miguel P. Monteiro, Nuno C. Marques, Bruno Silva, Bruno Palma, João Cardoso

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Matrix and data manipulation programming languages are an essential tool for data analysts. However, these languages are often unstructured and lack modularity mechanisms. This paper presents a business intelligence approach for studying the manifestations of lack of modularity support in that kind of languages. The study is focused on MATLAB as a well established representative of those languages. We present a technique for the automatic detection and quantification of concerns in MATLAB, as well as their exploration in a code base. Ubiquitous Self Organizing Map (UbiSOM) is used based on direct usage of indicators representing different sets of tokens in the code. UbiSOM is quite effective to detect patterns of co-occurrence between multiple concerns. To illustrate, a repository comprising over 35, 000 MATLAB files is analyzed using the technique and relevant conclusions are drawn.

Original languageEnglish
Title of host publicationProgress in Artificial Intelligence - 18th EPIA Conference on Artificial Intelligence, EPIA 2017, Proceedings
EditorsZ. Vale, E. Oliveira, J. Gama, H. Lopes Cardoso
Place of PublicationCham
PublisherSpringer Verlag
Pages573-584
Number of pages12
Volume10423 LNAI
ISBN (Electronic)978-3-319-65340-2
ISBN (Print)978-331965339-6
DOIs
Publication statusPublished - 2017
Event18th EPIA Conference on Artificial Intelligence, EPIA 2017 - Porto, Portugal
Duration: 5 Sep 20178 Sep 2017

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
PublisherSpringer Verlag
Volume10423 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference18th EPIA Conference on Artificial Intelligence, EPIA 2017
CountryPortugal
CityPorto
Period5/09/178/09/17

Fingerprint

MATLAB
Self organizing maps
Self-organizing Map
Modularity
Business Intelligence
Competitive intelligence
Computer programming languages
Repository
Quantification
Programming Languages
Manipulation
Language

Keywords

  • Business intelligence
  • Concern metrics
  • Concern mining
  • MATLAB
  • Modularity
  • Self-organizing maps
  • Token-based technique

Cite this

Monteiro, M. P., Marques, N. C., Silva, B., Palma, B., & Cardoso, J. (2017). Toward a token-based approach to concern detection in MATLAB sources. In Z. Vale, E. Oliveira, J. Gama, & H. L. Cardoso (Eds.), Progress in Artificial Intelligence - 18th EPIA Conference on Artificial Intelligence, EPIA 2017, Proceedings (Vol. 10423 LNAI, pp. 573-584). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 10423 LNAI). Cham: Springer Verlag. https://doi.org/10.1007/978-3-319-65340-2_47
Monteiro, Miguel P. ; Marques, Nuno C. ; Silva, Bruno ; Palma, Bruno ; Cardoso, João. / Toward a token-based approach to concern detection in MATLAB sources. Progress in Artificial Intelligence - 18th EPIA Conference on Artificial Intelligence, EPIA 2017, Proceedings. editor / Z. Vale ; E. Oliveira ; J. Gama ; H. Lopes Cardoso. Vol. 10423 LNAI Cham : Springer Verlag, 2017. pp. 573-584 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{41430a1e5b704314937e3a99f8a52eb7,
title = "Toward a token-based approach to concern detection in MATLAB sources",
abstract = "Matrix and data manipulation programming languages are an essential tool for data analysts. However, these languages are often unstructured and lack modularity mechanisms. This paper presents a business intelligence approach for studying the manifestations of lack of modularity support in that kind of languages. The study is focused on MATLAB as a well established representative of those languages. We present a technique for the automatic detection and quantification of concerns in MATLAB, as well as their exploration in a code base. Ubiquitous Self Organizing Map (UbiSOM) is used based on direct usage of indicators representing different sets of tokens in the code. UbiSOM is quite effective to detect patterns of co-occurrence between multiple concerns. To illustrate, a repository comprising over 35, 000 MATLAB files is analyzed using the technique and relevant conclusions are drawn.",
keywords = "Business intelligence, Concern metrics, Concern mining, MATLAB, Modularity, Self-organizing maps, Token-based technique",
author = "Monteiro, {Miguel P.} and Marques, {Nuno C.} and Bruno Silva and Bruno Palma and Jo{\~a}o Cardoso",
year = "2017",
doi = "10.1007/978-3-319-65340-2_47",
language = "English",
isbn = "978-331965339-6",
volume = "10423 LNAI",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "573--584",
editor = "Z. Vale and E. Oliveira and J. Gama and Cardoso, {H. Lopes }",
booktitle = "Progress in Artificial Intelligence - 18th EPIA Conference on Artificial Intelligence, EPIA 2017, Proceedings",
address = "Germany",

}

Monteiro, MP, Marques, NC, Silva, B, Palma, B & Cardoso, J 2017, Toward a token-based approach to concern detection in MATLAB sources. in Z Vale, E Oliveira, J Gama & HL Cardoso (eds), Progress in Artificial Intelligence - 18th EPIA Conference on Artificial Intelligence, EPIA 2017, Proceedings. vol. 10423 LNAI, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 10423 LNAI, Springer Verlag, Cham, pp. 573-584, 18th EPIA Conference on Artificial Intelligence, EPIA 2017, Porto, Portugal, 5/09/17. https://doi.org/10.1007/978-3-319-65340-2_47

Toward a token-based approach to concern detection in MATLAB sources. / Monteiro, Miguel P.; Marques, Nuno C.; Silva, Bruno; Palma, Bruno; Cardoso, João.

Progress in Artificial Intelligence - 18th EPIA Conference on Artificial Intelligence, EPIA 2017, Proceedings. ed. / Z. Vale; E. Oliveira; J. Gama; H. Lopes Cardoso. Vol. 10423 LNAI Cham : Springer Verlag, 2017. p. 573-584 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 10423 LNAI).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Toward a token-based approach to concern detection in MATLAB sources

AU - Monteiro, Miguel P.

AU - Marques, Nuno C.

AU - Silva, Bruno

AU - Palma, Bruno

AU - Cardoso, João

PY - 2017

Y1 - 2017

N2 - Matrix and data manipulation programming languages are an essential tool for data analysts. However, these languages are often unstructured and lack modularity mechanisms. This paper presents a business intelligence approach for studying the manifestations of lack of modularity support in that kind of languages. The study is focused on MATLAB as a well established representative of those languages. We present a technique for the automatic detection and quantification of concerns in MATLAB, as well as their exploration in a code base. Ubiquitous Self Organizing Map (UbiSOM) is used based on direct usage of indicators representing different sets of tokens in the code. UbiSOM is quite effective to detect patterns of co-occurrence between multiple concerns. To illustrate, a repository comprising over 35, 000 MATLAB files is analyzed using the technique and relevant conclusions are drawn.

AB - Matrix and data manipulation programming languages are an essential tool for data analysts. However, these languages are often unstructured and lack modularity mechanisms. This paper presents a business intelligence approach for studying the manifestations of lack of modularity support in that kind of languages. The study is focused on MATLAB as a well established representative of those languages. We present a technique for the automatic detection and quantification of concerns in MATLAB, as well as their exploration in a code base. Ubiquitous Self Organizing Map (UbiSOM) is used based on direct usage of indicators representing different sets of tokens in the code. UbiSOM is quite effective to detect patterns of co-occurrence between multiple concerns. To illustrate, a repository comprising over 35, 000 MATLAB files is analyzed using the technique and relevant conclusions are drawn.

KW - Business intelligence

KW - Concern metrics

KW - Concern mining

KW - MATLAB

KW - Modularity

KW - Self-organizing maps

KW - Token-based technique

UR - http://www.scopus.com/inward/record.url?scp=85028979281&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-65340-2_47

DO - 10.1007/978-3-319-65340-2_47

M3 - Conference contribution

SN - 978-331965339-6

VL - 10423 LNAI

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 573

EP - 584

BT - Progress in Artificial Intelligence - 18th EPIA Conference on Artificial Intelligence, EPIA 2017, Proceedings

A2 - Vale, Z.

A2 - Oliveira, E.

A2 - Gama, J.

A2 - Cardoso, H. Lopes

PB - Springer Verlag

CY - Cham

ER -

Monteiro MP, Marques NC, Silva B, Palma B, Cardoso J. Toward a token-based approach to concern detection in MATLAB sources. In Vale Z, Oliveira E, Gama J, Cardoso HL, editors, Progress in Artificial Intelligence - 18th EPIA Conference on Artificial Intelligence, EPIA 2017, Proceedings. Vol. 10423 LNAI. Cham: Springer Verlag. 2017. p. 573-584. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-319-65340-2_47