This paper addresses the problem of balanced, redundant indexing of media information. Our goal is to partition and distribute the search index, taking advantage of the distributed systems properties: balanced load across nodes, redundancy on node down and efflcient node usage under concurrent querying. We follow an information compression approach to solve this problem and propose to represent data with overcomplete codebooks, where each document is represented by only a few codewords and an indexing node is responsible for several codewords. Quantization algorithms are designed to fit the original data as best as possible, leading to bias towards codewords that fit the principal directions of data. In this paper, we propose the balanced KSVD (B-KSVD) algorithm, that distributes the allocation of data across a balanced number of codewords, according to the global distribution of data. Indexing experiments showed that B-KSVD can achieve 38% 1-recall by inspecting only 1% of the full index, distributed over 10 partitions. Traditional methods based on k-means need to either use larger codebooks or to inspect a larger portion of the index to achieve the same retrieval performance.
|Number of pages||8|
|Publication status||Published - 2017|
- Approximate nearest neighbor search
- Dictionary design
- Distributed search
- High-dimensional indexing
- Search space partitioning