13-02-2017, 03:18 PM
Hadoop is a Java-based open source programming framework that supports the processing and storage of extremely large data-sets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation.
Hadoop lets you run applications on systems with thousands of product hardware nodes and handle thousands of terabytes of data. Its distributed file system facilitates data transfer speeds between nodes and allows the system to continue operating in the event of a node failure. This approach reduces the risk of catastrophic system failures and unexpected data loss, even if a significant number of nodes become inoperative. Consequently, Hadoop quickly emerged as a basis for large-scale data processing tasks, such as scientific analytics, business planning and sales, and processing huge volumes of sensor data, even from Internet-based sensors.
Hadoop was created by computer scientists Doug Cutting and Mike Cafarella in 2006 to support the distribution of Nutch's search engine. It was inspired by Google MapReduce, a software framework in which an application is divided into numerous small parts. Any of these parts, which are also called fragments or blocks, can be run on any node in the cluster. After years of development within the open source community, Hadoop 1.0 became publicly available in November 2012 as part of the Apache project sponsored by the Apache Software Foundation.