Exploiting Dynamic Resource Allocation for Efficient Parallel Data Processing
#1

Abstract—
In recent years ad-hoc parallel data processing has emerged to be one of the killer applications for Infrastructure-as-a-Service (IaaS) clouds. Major Cloud computing companies have started to integrate frameworks for parallel data processing in theirproduct portfolio, making it easy for customers to access these services and to deploy their programs. However, the processingframeworks which are currently used have been designed for static, homogeneous cluster setups and disregard the particular natureof a cloud. Consequently, the allocated compute resources may be inadequate for big parts of the submitted job and unnecessarilyincrease processing time and cost. In this paper we discuss the opportunities and challenges for efficient parallel data processingin clouds and present our research project Nephele. Nephele is the first data processing framework to explicitly exploit the dynamicresource allocation offered by today’s IaaS clouds for both, task scheduling and execution. Particular tasks of a processing job can beassigned to different types of virtual machines which are automatically instantiated and terminated during the job execution. Based onthis new framework, we perform extended evaluations of MapReduce-inspired processing jobs on an IaaS cloud system and comparethe results to the popular data processing framework Hadoop.
1 INTRODUCTION
Today a growing number of companies have to processhuge amounts of data in a cost-efficient manner. Classicrepresentatives for these companies are operators ofInternet search engines, like Google, Yahoo, or Microsoft.The vast amount of data they have to deal with everyday has made traditional database solutions prohibitivelyexpensive [5]. Instead, these companies havepopularized an architectural paradigm based on a largenumber of commodity servers. Problems like processingcrawled documents or regenerating a web index are splitinto several independent subtasks, distributed amongthe available nodes, and computed in parallel.In order to simplify the development of distributedapplications on top of such architectures, many of thesecompanies have also built customized data processingframeworks. Examples are Google’s MapReduce [9], Microsoft’sDryad [14], or Yahoo!’s Map-Reduce-Merge [6].They can be classified by terms like high throughputcomputing (HTC) or many-task computing (MTC), dependingon the amount of data and the number oftasks involved in the computation [20]. Although thesesystems differ in design, their programming modelsshare similar objectives, namely hiding the hassle ofparallel programming, fault tolerance, and execution optimizationsfrom the developer. Developers can typicallycontinue to write sequential programs. The processingframework then takes care of distributing the programamong the available nodes and executes each instanceof the program on the appropriate fragment of data. For companies that only have to process large amountsof data occasionally running their own data center isobviously not an option. Instead, Cloud computing hasemerged as a promising approach to rent a large IT infrastructureon a short-term pay-per-usage basis. Operatorsof so-called Infrastructure-as-a-Service (IaaS) clouds,like Amazon EC2 [1], let their customers allocate, access,and control a set of virtual machines (VMs) which runinside their data centers and only charge them for theperiod of time the machines are allocated. The VMs aretypically offered in different types, each type with itsown characteristics (number of CPU cores, amount ofmain memory, etc.) and cost.Since the VM abstraction of IaaS clouds fits the architecturalparadigm assumed by the data processingframeworks described above, projects like Hadoop [25],a popular open source implementation of Google’sMapReduce framework, already have begun to promoteusing their frameworks in the cloud [29]. Only recently,Amazon has integrated Hadoop as one of its core infrastructureservices [2]. However, instead of embracingits dynamic resource allocation, current data processingframeworks rather expect the cloud to imitate the staticnature of the cluster environments they were originallydesigned for. E.g., at the moment the types and numberof VMs allocated at the beginning of a compute jobcannot be changed in the course of processing, althoughthe tasks the job consists of might have completelydifferent demands on the environment. As a result,rented resources may be inadequate for big parts of theprocessing job, which may lower the overall processingperformance and increase the cost.


Download full report
http://dsl.cs.uchicago.edu/TPDS_MTC/pape...1-0012.pdf
Reply
#2
can't get the report... can u give...
Reply
#3



to get information about the topic"Exploiting Dynamic Resource Allocation for Efficient Parallel Data Processing" refer the page link bellow
http://studentbank.in/report-exploiting-...7#pid58007
Reply

Important Note..!

If you are not satisfied with above reply ,..Please

ASK HERE

So that we will collect data for you and will made reply to the request....OR try below "QUICK REPLY" box to add a reply to this page
Popular Searches: system architecture diagram for exploiting dynamic resource allocation, er diagram for exploiting dynamic resource allocation, data flow diagram forexploiting dynamic resource allocation for efficient data processing in the cloud, parallel computing with remot sensing data processing, dfd for dynamic resource allocation, ppt for exployting dynamic resource allocation, e r diagram for exploiting dynamic resource allocation for efficient parallel data in the cloud,

[-]
Quick Reply
Message
Type your reply to this message here.

Image Verification
Please enter the text contained within the image into the text box below it. This process is used to prevent automated spam bots.
Image Verification
(case insensitive)

Possibly Related Threads...
Thread Author Replies Views Last Post
  A Link-Based Cluster Ensemble Approach for Categorical Data Clustering 1 1,062 16-02-2017, 10:51 AM
Last Post: jaseela123d
  Exploiting the Functional and Taxonomic Structure of Genomic Data by Probabilistic To 1 750 14-02-2017, 04:15 PM
Last Post: jaseela123d
  Energy-Aware Autonomic Resource Allocation in Multi tier Virtualized Environments 1 736 14-02-2017, 02:13 PM
Last Post: jaseela123d
  An Efficient Algorithm for Mining Frequent Patterns full report project topics 3 4,714 01-10-2016, 10:02 AM
Last Post: Guest
  Remote Server Monitoring System For Corporate Data Centers smart paper boy 3 2,806 28-03-2016, 02:51 PM
Last Post: dhanabhagya
  Secured Data Hiding and Extractions Using BPCS project report helper 4 3,644 04-02-2016, 12:52 PM
Last Post: seminar report asees
  image processing projects ideas project topics 4 4,997 05-01-2016, 02:22 PM
Last Post: seminar report asees
  Data Hiding in Binary Images for Authentication & Annotation project topics 2 1,812 06-11-2015, 02:27 PM
Last Post: seminar report asees
  Image Processing - Noise Reduction project topics 3 3,751 26-08-2015, 02:55 PM
Last Post: dhivya srinivasan
  DATA LEAKAGE DETECTION project topics 16 13,002 31-07-2015, 02:59 PM
Last Post: seminar report asees

Forum Jump: