Abstract
Data analytic applications are characterized by largedata sets that are subject to a series of processing phases. Someof these phases are executed sequentially but others can beexecuted concurrently or in parallel on clusters, grids or clouds.The MapReduce programming model has been applied to processlarge data sets in cluster and cloud environments. For developingan application using MapReduce there is a need toinstall/configure/access specific frameworks such as ApacheHadoop or Elastic MapReduce in Amazon Cloud. It would bedesirable to provide more flexibility in adjusting suchconfigurations according to the application characteristics.Furthermore the composition of the multiple phases of a dataanalytic application requires the specification of all the phasesand their orchestration. The original MapReduce model andenvironment lacks flexible support for such configuration andcomposition. Recognizing that scientific workflows have beensuccessfully applied to modeling complex applications, this paperdescribes our experiments on implementing MapReduce as subworkflowsin the AWARD framework (Autonomic WorkflowActivities Reconfigurable and Dynamic). A text mining dataanalytic application is modeled as a complex workflow withmultiple phases, where individual workflow nodes supportMapReduce computations. As in typical MapReduceenvironments, the end user only needs to define the applicationalgorithms for input data processing and for the map and reducefunctions. In the paper we present experimental results whenusing the AWARD framework to execute MapReduce workflowsdeployed over multiple Amazon EC2 (Elastic Compute Cloud)instances.
Original language | Unknown |
---|---|
Title of host publication | 2012 4th IEEE International Conference on Cloud Computing Technology and Science |
Pages | 1-8 |
Publication status | Published - 1 Jan 2012 |
Event | 2012 4th IEEE International Conference on Cloud Computing Technology and Science - Duration: 1 Jan 2012 → … |
Conference
Conference | 2012 4th IEEE International Conference on Cloud Computing Technology and Science |
---|---|
Period | 1/01/12 → … |