Programming models and runtimes

Georges Da Costa, Alexey L. Lastovetsky, Jorge G. Barbosa, Juan Carlos Díaz-Martín, Juan Luis García-Zapata, Matthias Janetschek, Emmanuel Jeannot, João Leitão, Ravi Reddy Manumachu, Radu Prodan, Juan A. Rico-Gallego, Peter Van Roy, Ali Shoker, Albert Van Der Linde

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

1 Citation (Scopus)

Abstract

Several millions of execution flows will be executed in ultrascale computing systems (UCS), and the task for the programmer to understand their coherency and for the runtime to coordinate them is unfathomable. Moreover, related to UCS large scale and their impact on reliability, the current static point of view is not more sufficient. A runtime cannot consider to restart an application because of the failure of a single node as statically several nodes will fail every day. Classical management of these failures by the programmers using checkpoint restart is also too limited due to the overhead at such a scale. The article explores programming models and runtimes required to facilitate the task of scaling and extracting performance on continuously evolving platforms, while providing resilience and fault-tolerant mechanisms to tackle the increasing probability of failures throughout the whole software stack.

Original languageEnglish
Title of host publicationUltrascale Computing Systems
PublisherInstitution of Engineering and Technology
Pages9-64
Number of pages56
ISBN (Electronic)9781785618345
ISBN (Print)9781785618338
DOIs
Publication statusPublished - 1 Jan 2019

Keywords

  • Checkpoint restart
  • Checkpointing
  • Distributed programming
  • Failure management
  • Fault-tolerant mechanisms
  • Programming models
  • Runtimes
  • Software fault tolerance
  • Software stack
  • Ultrascale computing systems

Fingerprint

Dive into the research topics of 'Programming models and runtimes'. Together they form a unique fingerprint.

Cite this