Abstract
In the era of global-scale services, big data analytical queries are often required to process datasets that span multiple data centers (DCs). In this setting, cross-DC bandwidth is often the scarcest, most volatile, and/or most expensive resource. However, current widely deployed big data analytics frameworks make no attempt to minimize the traffic traversing these links
In this paper, we present PIXIDA, a scheduler that aims to minimize data movement across resource constrained links To achieve this, we introduce a new abstraction called SILO, which is key to modeling PIXIDA'S scheduling goals as a graph partitioning problem. Furthermore, we show that existing graph partitioning problem formulations do not map to how big data jobs work, causing their solutions to miss opportunities for avoiding data movement. To address this, we formulate a new graph partitioning problem and propose a novel algorithm to solve it. We integrated PIXIDA in Spark and our experiments show that, when compared to existing schedulers, PIXIDA achieves a significant traffic reduction of up to similar to 9 x on the aforementioned links.
In this paper, we present PIXIDA, a scheduler that aims to minimize data movement across resource constrained links To achieve this, we introduce a new abstraction called SILO, which is key to modeling PIXIDA'S scheduling goals as a graph partitioning problem. Furthermore, we show that existing graph partitioning problem formulations do not map to how big data jobs work, causing their solutions to miss opportunities for avoiding data movement. To address this, we formulate a new graph partitioning problem and propose a novel algorithm to solve it. We integrated PIXIDA in Spark and our experiments show that, when compared to existing schedulers, PIXIDA achieves a significant traffic reduction of up to similar to 9 x on the aforementioned links.
Original language | English |
---|---|
Pages (from-to) | 72-83 |
Number of pages | 12 |
Journal | Proceedings Of The Vldb Endowment |
Volume | 9 |
Issue number | 2 |
DOIs | |
Publication status | Published - Oct 2015 |
Keywords
- Algorithms
- Graph theory
- Scheduling