Balanced search space partitioning for distributed media redundant indexing

Research output: Contribution to conferencePaperpeer-review

1 Citation (Scopus)

Abstract

This paper addresses the problem of balanced, redundant indexing of media information. Our goal is to partition and distribute the search index, taking advantage of the distributed systems properties: balanced load across nodes, redundancy on node down and efflcient node usage under concurrent querying. We follow an information compression approach to solve this problem and propose to represent data with overcomplete codebooks, where each document is represented by only a few codewords and an indexing node is responsible for several codewords. Quantization algorithms are designed to fit the original data as best as possible, leading to bias towards codewords that fit the principal directions of data. In this paper, we propose the balanced KSVD (B-KSVD) algorithm, that distributes the allocation of data across a balanced number of codewords, according to the global distribution of data. Indexing experiments showed that B-KSVD can achieve 38% 1-recall by inspecting only 1% of the full index, distributed over 10 partitions. Traditional methods based on k-means need to either use larger codebooks or to inspect a larger portion of the index to achieve the same retrieval performance.
Original languageEnglish
Pages142-149
Number of pages8
Publication statusPublished - 2017

Keywords

  • Approximate nearest neighbor search
  • Clustering
  • Dictionary design
  • Distributed search
  • High-dimensional indexing
  • Search space partitioning

Fingerprint

Dive into the research topics of 'Balanced search space partitioning for distributed media redundant indexing'. Together they form a unique fingerprint.

Cite this