17-06-2009, 11:17 AM
DOWNLOAD THE SEMINAR REPORT
SEMIANR REPORT ON MapReduce: Simplified Data Processing On Large Clusters
Abstract
In the new era of computing, instead of buying software for satisfying the computing needs of users, millions are rely on the Internet. Internet services are already significant forces in searching, retail purchases music downloads and auctions. But the web data sets are large and the computations have to be distributed across hundreds or thousands of machines inorder to finish in a reasonable amount of time. In this distributed environment, a programmer has to write large amounts of complex code to deal with the issues such as parallelization, data distribution, fault tolerance etc., even if the original computation is simple. As a reaction to this complexity, a new abstraction was designed by Google that allows us to express the simple computations but hides the messy details of parallelization, faulttolerance, data distribution and load balancing in a runtime library. This new software tool is called MapReduce Framework. MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key.