ASK HERE

seminar class · 04-05-2011, 10:40 AM

Abstract—
In recent years ad-hoc parallel data processing has emerged to be one of the killer applications for Infrastructure-as-a-Service (IaaS) clouds. Major Cloud computing companies have started to integrate frameworks for parallel data processing in theirproduct portfolio, making it easy for customers to access these services and to deploy their programs. However, the processingframeworks which are currently used have been designed for static, homogeneous cluster setups and disregard the particular natureof a cloud. Consequently, the allocated compute resources may be inadequate for big parts of the submitted job and unnecessarilyincrease processing time and cost. In this paper we discuss the opportunities and challenges for efficient parallel data processingin clouds and present our research project Nephele. Nephele is the first data processing framework to explicitly exploit the dynamicresource allocation offered by today’s IaaS clouds for both, task scheduling and execution. Particular tasks of a processing job can beassigned to different types of virtual machines which are automatically instantiated and terminated during the job execution. Based onthis new framework, we perform extended evaluations of MapReduce-inspired processing jobs on an IaaS cloud system and comparethe results to the popular data processing framework Hadoop.
1 INTRODUCTION
Today a growing number of companies have to processhuge amounts of data in a cost-efficient manner. Classicrepresentatives for these companies are operators ofInternet search engines, like Google, Yahoo, or Microsoft.The vast amount of data they have to deal with everyday has made traditional database solutions prohibitivelyexpensive [5]. Instead, these companies havepopularized an architectural paradigm based on a largenumber of commodity servers. Problems like processingcrawled documents or regenerating a web index are splitinto several independent subtasks, distributed amongthe available nodes, and computed in parallel.In order to simplify the development of distributedapplications on top of such architectures, many of thesecompanies have also built customized data processingframeworks. Examples are Google’s MapReduce [9], Microsoft’sDryad [14], or Yahoo!’s Map-Reduce-Merge [6].They can be classified by terms like high throughputcomputing (HTC) or many-task computing (MTC), dependingon the amount of data and the number oftasks involved in the computation [20]. Although thesesystems differ in design, their programming modelsshare similar objectives, namely hiding the hassle ofparallel programming, fault tolerance, and execution optimizationsfrom the developer. Developers can typicallycontinue to write sequential programs. The processingframework then takes care of distributing the programamong the available nodes and executes each instanceof the program on the appropriate fragment of data. For companies that only have to process large amountsof data occasionally running their own data center isobviously not an option. Instead, Cloud computing hasemerged as a promising approach to rent a large IT infrastructureon a short-term pay-per-usage basis. Operatorsof so-called Infrastructure-as-a-Service (IaaS) clouds,like Amazon EC2 [1], let their customers allocate, access,and control a set of virtual machines (VMs) which runinside their data centers and only charge them for theperiod of time the machines are allocated. The VMs aretypically offered in different types, each type with itsown characteristics (number of CPU cores, amount ofmain memory, etc.) and cost.Since the VM abstraction of IaaS clouds fits the architecturalparadigm assumed by the data processingframeworks described above, projects like Hadoop [25],a popular open source implementation of Google’sMapReduce framework, already have begun to promoteusing their frameworks in the cloud [29]. Only recently,Amazon has integrated Hadoop as one of its core infrastructureservices [2]. However, instead of embracingits dynamic resource allocation, current data processingframeworks rather expect the cloud to imitate the staticnature of the cluster environments they were originallydesigned for. E.g., at the moment the types and numberof VMs allocated at the beginning of a compute jobcannot be changed in the course of processing, althoughthe tasks the job consists of might have completelydifferent demands on the environment. As a result,rented resources may be inadequate for big parts of theprocessing job, which may lower the overall processingperformance and increase the cost.

Download full report
http://dsl.cs.uchicago.edu/TPDS_MTC/pape...1-0012.pdf

raghupr87 · 19-10-2011, 06:52 PM

can't get the report... can u give...

**seminar addict** · 20-10-2011, 09:25 AM

to get information about the topic"Exploiting Dynamic Resource Allocation for Efficient Parallel Data Processing" refer the page link bellow
http://studentbank.in/report-exploiting-...7#pid58007

Possibly Related Threads...
Thread		Author	Replies	Views	Last Post
	A Link-Based Cluster Ensemble Approach for Categorical Data Clustering		1	1,161	16-02-2017, 10:51 AM Last Post: jaseela123d
	Exploiting the Functional and Taxonomic Structure of Genomic Data by Probabilistic To		1	819	14-02-2017, 04:15 PM Last Post: jaseela123d
	Energy-Aware Autonomic Resource Allocation in Multi tier Virtualized Environments		1	804	14-02-2017, 02:13 PM Last Post: jaseela123d
	An Efficient Algorithm for Mining Frequent Patterns full report	project topics	3	4,912	01-10-2016, 10:02 AM Last Post: Guest
	Remote Server Monitoring System For Corporate Data Centers	smart paper boy	3	2,949	28-03-2016, 02:51 PM Last Post: dhanabhagya
	Secured Data Hiding and Extractions Using BPCS	project report helper	4	3,743	04-02-2016, 12:52 PM Last Post: seminar report asees
	image processing projects ideas	project topics	4	5,188	05-01-2016, 02:22 PM Last Post: seminar report asees
	Data Hiding in Binary Images for Authentication & Annotation	project topics	2	1,892	06-11-2015, 02:27 PM Last Post: seminar report asees
	Image Processing - Noise Reduction	project topics	3	3,822	26-08-2015, 02:55 PM Last Post: dhivya srinivasan
	DATA LEAKAGE DETECTION	project topics	16	13,273	31-07-2015, 02:59 PM Last Post: seminar report asees

Important Note..!

ASK HERE