TY - GEN

T1 - kd-SNN: A Metric Data Structure Seconding the Clustering of Spatial Data

AU - Faustino, Bruno Filipe

AU - Moura-Pires, Joao

AU - Santos, Maribel Yasmina

AU - Moreira, Guilherme

PY - 2014

Y1 - 2014

N2 - Large amounts of spatio-temporal data are continuously collected through the use of location devices or sensor technologies. One of the techniques usually used to obtain a first insight on data is clustering. The Shared Nearest Neighbour (SNN) is a clustering algorithm that finds clusters with different densities, shapes and sizes, and also identifies noise in data, making it a good candidate to deal with spatial data. However, its time complexity is, in the worst case, O(n(2)), compromising its scalability. This paper presents the use of a metric data structure, the kd-Tree, to index spatial data and support the SNN in querying for the k-nearest neighbours, improving the time complexity in the average case of the algorithm, when dealing with low dimensional data, to at most O(n x log n). The proposed algorithm, the kd-SNN, was evaluated in terms of performance, showing huge improvements over existing approaches, allowing the identification of the main traffic routes by completely clustering a maritime data set.

AB - Large amounts of spatio-temporal data are continuously collected through the use of location devices or sensor technologies. One of the techniques usually used to obtain a first insight on data is clustering. The Shared Nearest Neighbour (SNN) is a clustering algorithm that finds clusters with different densities, shapes and sizes, and also identifies noise in data, making it a good candidate to deal with spatial data. However, its time complexity is, in the worst case, O(n(2)), compromising its scalability. This paper presents the use of a metric data structure, the kd-Tree, to index spatial data and support the SNN in querying for the k-nearest neighbours, improving the time complexity in the average case of the algorithm, when dealing with low dimensional data, to at most O(n x log n). The proposed algorithm, the kd-SNN, was evaluated in terms of performance, showing huge improvements over existing approaches, allowing the identification of the main traffic routes by completely clustering a maritime data set.

KW - kd-tree

KW - SNN

KW - spatial data

KW - movement data

KW - route identification

M3 - Conference contribution

SN - 978-3-319-09143-3

T3 - Lecture Notes in Computer Science

SP - 312

EP - 327

BT - COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2014, PT 1

A2 - Murgante, B

A2 - Misra, S

A2 - Rocha, AMAC

A2 - Torre, C

A2 - Rocha, JG

A2 - Falcao, MI

A2 - Taniar, D

A2 - Apduhan, BO

A2 - Gervasi, O

PB - SPRINGER-VERLAG BERLIN

T2 - 14th International Conference on Computational Science and Its Applications (ICCSA)

Y2 - 30 June 2014 through 3 July 2014

ER -