Abstract
Matrix and data manipulation programming languages are an essential tool for data analysts. However, these languages are often unstructured and lack modularity mechanisms. This article presents a knowledge discovery approach for studying manifestations of the lack of modularity support in that sort of languages. The study is focused on Matlab, as a well-established representative of those languages. We present a technique for the automatic detection and quantification of concerns in Matlab and their exploration in a code base. The Ubiquitous Self Organizing Map (UbiSOM) is used to perform exploratory data analysis over concerns detected in a, possibly changing, repository of Matlab files. The UbiSOM is quite effective in detecting patterns of co-occurrence of multiple concerns. To illustrate the technique, a repository comprising over 35,000 Matlab files is analysed. The results show that the use of Token Density metrics in conjunction with UbiSOM enables the detection of patterns of co-occurrence of multiple concerns in m-files.
Original language | English |
---|---|
Article number | e12306 |
Journal | Expert Systems |
Volume | 35 |
Issue number | 4 |
DOIs | |
Publication status | Published - 1 Aug 2018 |
Keywords
- business intelligence
- concern metrics
- concern mining
- Matlab
- modularity
- self-organising maps
- token-based technique