Apache Hadoop and Hive
#1

Who Am I?
Hadoop Developer

– Core contributor since Hadoop’s infancy
– Project Lead for Hadoop Distributed File System
• Facebook (Hadoop, Hive, Scribe)
• Yahoo! (Hadoop in Yahoo Search)
• Veritas (San Point Direct, Veritas File System)
• IBM Transarc (Andrew File System)
• UW Computer Science Alumni (Condor Project)
Hadoop, Why?
• Need to process Multi Petabyte Datasets
• Expensive to build reliability in each application.
• Nodes fail every day
– Failure is expected, rather than exceptional.
– The number of nodes in a cluster is not constant.
• Need common infrastructure
– Efficient, reliable, Open Source Apache License
• The above goals are same as Condor, but
– Workloads are IO bound and not CPU bound
Hive, Why?
• Need a Multi Petabyte Warehouse
• Files are insufficient data abstractions
– Need tables, schemas, partitions, indices
SQL is highly popular
Need for an open data format

– RDBMS have a closed data format
– flexible schema
Hive is a Hadoop subproject!
• Hadoop & Hive History
• Dec 2004 – Google GFS paper published
• July 2005 – Nutch uses MapReduce
• Feb 2006 – Becomes Lucene subproject
• Apr 2007 – Yahoo! on 1000-node cluster
• Jan 2008 – An Apache Top Level Project
• Jul 2008 – A 4000 node test cluster
• Sept 2008 – Hive becomes a Hadoop subproject
• Who uses Hadoop?
• Amazon/A9
• Facebook
• Google
• IBM
• Joost
• Last.fm
• New York Times
• PowerSet
• Veoh
• Yahoo!
• Commodity Hardware
• Goals of HDFS
Very Large Distributed File System
– 10K nodes, 100 million files, 10 PB
Assumes Commodity Hardware
– Files are replicated to handle hardware failure
– Detect failures and recovers from them
Optimized for Batch Processing
– Data locations exposed so that computations can move to where data resides
– Provides very high aggregate bandwidth
User Space, runs on heterogeneous OS
• Distributed File System
Single Namespace for entire cluster
Data Coherency
– Write-once-read-many access model
– Client can only append to existing files
Files are broken up into blocks
– Typically 128 MB block size
– Each block replicated on multiple DataNodes
Intelligent Client
– Client can find location of blocks
– Client accesses data directly from DataNode
• NameNode Metadata
Meta-data in Memory
– The entire metadata is in main memory
– No demand paging of meta-data
Types of Metadata
– List of files
– List of Blocks for each file
– List of DataNodes for each block
– File attributes, e.g creation time, replication factor
A Transaction Log
– Records file creations, file deletions. etc


download full report
http://googleurl?sa=t&source=web&cd=6&ve...search.ppt&ei=Cq-STaf9BYWrcbb9uYkH&usg=AFQjCNGEVfX03-YuRa9eae1aK5McJrQ_hA&sig2=IKSzKFyXJncBifMNhSSUVg
Reply

Important Note..!

If you are not satisfied with above reply ,..Please

ASK HERE

So that we will collect data for you and will made reply to the request....OR try below "QUICK REPLY" box to add a reply to this page
Popular Searches: hadoop ieee seminar paper, hadoop flume example, hadoop source code browse, seminar on hadoop, information about hadoop acomputer science topic, seminar on topic apache helicopter, hadoop uml,

[-]
Quick Reply
Message
Type your reply to this message here.

Image Verification
Please enter the text contained within the image into the text box below it. This process is used to prevent automated spam bots.
Image Verification
(case insensitive)

Possibly Related Threads...
Thread Author Replies Views Last Post
  LAMP TECHNOLOGY (LINUX,APACHE,MYSQL,PHP) seminar class 1 3,472 04-04-2018, 04:11 PM
Last Post: Guest
  A survey of usage of Data Mining and Data Warehousing in Academic Institution and Lib seminar class 1 2,119 29-11-2012, 12:56 PM
Last Post: seminar details
  Intelligent Electronic Devices (IEDs) and Supervisory Control and Data Acquisition computer girl 0 1,141 09-06-2012, 06:01 PM
Last Post: computer girl
  The 8051 Microcontroller and Embedded Systems Using Assembly and C computer girl 0 1,036 04-06-2012, 05:41 PM
Last Post: computer girl
  HADOOP smart paper boy 0 1,136 19-07-2011, 10:16 AM
Last Post: smart paper boy
  Lean and Zoom: Proximity-Aware User Interface and Content Magnification seminar class 0 927 05-05-2011, 02:39 PM
Last Post: seminar class
  Efficient and Secure Content Processing and Distribution by Cooperative Intermediarie project topics 5 4,718 03-05-2011, 10:33 AM
Last Post: seminar class
  Network Monitoring and Measurement and its application in security field seminar surveyer 1 1,411 28-03-2011, 10:36 AM
Last Post: seminar class
  Virus Attack on Computers And Mobiles And Palmtops full report computer science technology 2 3,456 18-03-2011, 12:08 PM
Last Post: seminar class
  Design and Optimization of Reversible BCD Adder/Subtractor Circuit for Quantum and Na seminar class 0 2,645 16-02-2011, 10:23 AM
Last Post: seminar class

Forum Jump: