ASK HERE

seminar class · 30-03-2011, 09:51 AM

Who Am I?
Hadoop Developer
– Core contributor since Hadoop’s infancy
– Project Lead for Hadoop Distributed File System
• Facebook (Hadoop, Hive, Scribe)
• Yahoo! (Hadoop in Yahoo Search)
• Veritas (San Point Direct, Veritas File System)
• IBM Transarc (Andrew File System)
• UW Computer Science Alumni (Condor Project)
Hadoop, Why?
• Need to process Multi Petabyte Datasets
• Expensive to build reliability in each application.
• Nodes fail every day
– Failure is expected, rather than exceptional.
– The number of nodes in a cluster is not constant.
• Need common infrastructure
– Efficient, reliable, Open Source Apache License
• The above goals are same as Condor, but
– Workloads are IO bound and not CPU bound
Hive, Why?
• Need a Multi Petabyte Warehouse
• Files are insufficient data abstractions
– Need tables, schemas, partitions, indices
SQL is highly popular
Need for an open data format
– RDBMS have a closed data format
– flexible schema
Hive is a Hadoop subproject!
• Hadoop & Hive History
• Dec 2004 – Google GFS paper published
• July 2005 – Nutch uses MapReduce
• Feb 2006 – Becomes Lucene subproject
• Apr 2007 – Yahoo! on 1000-node cluster
• Jan 2008 – An Apache Top Level Project
• Jul 2008 – A 4000 node test cluster
• Sept 2008 – Hive becomes a Hadoop subproject
• Who uses Hadoop?
• Amazon/A9
• Facebook
• Google
• IBM
• Joost
• Last.fm
• New York Times
• PowerSet
• Veoh
• Yahoo!
• Commodity Hardware
• Goals of HDFS
Very Large Distributed File System
– 10K nodes, 100 million files, 10 PB
Assumes Commodity Hardware
– Files are replicated to handle hardware failure
– Detect failures and recovers from them
Optimized for Batch Processing
– Data locations exposed so that computations can move to where data resides
– Provides very high aggregate bandwidth
User Space, runs on heterogeneous OS
• Distributed File System
Single Namespace for entire cluster
Data Coherency
– Write-once-read-many access model
– Client can only append to existing files
Files are broken up into blocks
– Typically 128 MB block size
– Each block replicated on multiple DataNodes
Intelligent Client
– Client can find location of blocks
– Client accesses data directly from DataNode
• NameNode Metadata
Meta-data in Memory
– The entire metadata is in main memory
– No demand paging of meta-data
Types of Metadata
– List of files
– List of Blocks for each file
– List of DataNodes for each block
– File attributes, e.g creation time, replication factor
A Transaction Log
– Records file creations, file deletions. etc

download full report
http://googleurl?sa=t&source=web&cd=6&ve...search.ppt&ei=Cq-STaf9BYWrcbb9uYkH&usg=AFQjCNGEVfX03-YuRa9eae1aK5McJrQ_hA&sig2=IKSzKFyXJncBifMNhSSUVg

Possibly Related Threads...
Thread		Author	Replies	Views	Last Post
	LAMP TECHNOLOGY (LINUX,APACHE,MYSQL,PHP)	seminar class	1	3,523	04-04-2018, 04:11 PM Last Post: Guest
	A survey of usage of Data Mining and Data Warehousing in Academic Institution and Lib	seminar class	1	2,158	29-11-2012, 12:56 PM Last Post: seminar details
	Intelligent Electronic Devices (IEDs) and Supervisory Control and Data Acquisition	computer girl	0	1,166	09-06-2012, 06:01 PM Last Post: computer girl
	The 8051 Microcontroller and Embedded Systems Using Assembly and C	computer girl	0	1,062	04-06-2012, 05:41 PM Last Post: computer girl
	HADOOP	smart paper boy	0	1,158	19-07-2011, 10:16 AM Last Post: smart paper boy
	Lean and Zoom: Proximity-Aware User Interface and Content Magnification	seminar class	0	954	05-05-2011, 02:39 PM Last Post: seminar class
	Efficient and Secure Content Processing and Distribution by Cooperative Intermediarie	project topics	5	4,773	03-05-2011, 10:33 AM Last Post: seminar class
	Network Monitoring and Measurement and its application in security field	seminar surveyer	1	1,442	28-03-2011, 10:36 AM Last Post: seminar class
	Virus Attack on Computers And Mobiles And Palmtops full report	computer science technology	2	3,487	18-03-2011, 12:08 PM Last Post: seminar class
	Design and Optimization of Reversible BCD Adder/Subtractor Circuit for Quantum and Na	seminar class	0	2,670	16-02-2011, 10:23 AM Last Post: seminar class

Important Note..!

ASK HERE