Abstract
In this paper we propose a large-scale high-dimensional indexing algorithm based on sparse approximation and inverted indexing. Our goal was to devise a method that smoothly scales to handle databases with over 100 million descriptors on a single machine. To meet this goal, we implemented an inverted indexed based on a sparsifying dictionary with l (0) regression to assign documents to buckets. The sparsifying dictionary is optimized to reduce the data dimensionality, by concentrating the energy of the original vector on a few coefficients of a higher dimensional representation. These descriptors are added to an inverted index explores the locality of the coefficients of sparse representations to enable efficient pruned search. Evaluation on four large-scale datasets with multiple types of features showed that our method compares favorably to state-of-the-art techniques. On a 100 million dataset of SIFT descriptors, our method achieved 47.6 % precision at 50, by inspecting only 1 % of the full dataset, and by using only 1/20 of the time of a linear search.
Original language | English |
---|---|
Pages (from-to) | 24389-24412 |
Number of pages | 24 |
Journal | Multimedia Tools And Applications |
Volume | 76 |
Issue number | 22 |
Publication status | Published - Nov 2017 |
Keywords
- Sparse approximation
- Multimedia indexing
- k-SVD
- l(0) penalty