Incremental MapReduce Computations

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review


Distributed processing of large data sets is an area that received much attention from re- searchers and practitioners over the last few years. In this context, several proposals exist that leverage the observation that often data sets evolve over time, and there is substantial overlap between the input to consecutive runs of a data processing job. This allows the programmers of these systems to devise an efficient logic to update the output upon an in- put change. However, most of these systems lack compatibility existing models and require the programmer to implement an application-specific dynamic algorithm, which increases algorithm and code complexity. In this chapter, we describe our previous work on building a platform called Incoop, which allows for running MapReduce computations incrementally and transparently. Incoop detects changes between two files that are used as inputs to consecutive MapReduce jobs, and efficiently propagates those changes until the new output is produced. The design of Incoop is based on memoizing the results of previously run tasks, and reusing these results whenever possible. Doing this efficiently introduces several technical challenges that are overcome with novel concepts, such as a large-scale storage system that efficiently computes deltas between two inputs, a contraction phase to break up the work of the Reduce phase, and an affinity- based scheduling algorithm. This chapter presents the motivation and design of Incoop, as well as a complete evaluation using several application benchmarks. Our results show significant performance improvements without changing a single line of application code.
Original languageEnglish
Title of host publicationLarge Scale and Big Data
Subtitle of host publicationProcessing and Management
EditorsSherif Sakr, Mohamed Medhat Gaber
Place of PublicationBoca Raton, Florida, USA
PublisherCRC Press
ISBN (Print)9781466581500
Publication statusPublished - 2014


Dive into the research topics of 'Incremental MapReduce Computations'. Together they form a unique fingerprint.

Cite this