chapter 7

32
CHAPTER 7 認認 Hadoop

Upload: makan

Post on 24-Feb-2016

71 views

Category:

Documents


0 download

DESCRIPTION

CHAPTER 7. 認識 Hadoop. Outline. 什麼是 Hadoop Hadoop 的 架構 HDFS ( Hadoop Distributed File System ) HBase. 什麼是 Hadoop Hadoop 的架構 HDFS ( Hadoop Distributed File System) HBase. 什麼是 Hadoop ?. Hadoop is 一個 Apache 專案 分散式計算的平台 提供使用者簡易撰寫並執行 處理海量 資料應用程式的軟體平台。. Cloud Applications. MapReduce. - PowerPoint PPT Presentation

TRANSCRIPT

CHAPTER 5

CHAPTER 7Hadoop1OutlineHadoopHadoopHDFS (Hadoop Distributed File System)HBase22HadoopHadoopHDFS (Hadoop Distributed File System)HBase33Hadoop?Hadoop isApache

Hadoop Distributed File System (HDFS)MapReduceHbaseA Cluster of MachinesCloud Applications44Hadoop(2002~2004)Doug-Cutting

LuceneJava APINutchNutch Lucene

55HadoopNutch

GoogleSOSP 2003 : The Google File SystemOSDI 2004 : MapReduce : Simplifed Data Processing on Large ClusterOSDI 2006 : Bigtable: A Distributed Storage System for Structured Data66Hadoop(2004~Now)Doug-Cutting Google Distributed File System (NDFS) MapReduce Nutch 2006Nutch (Distributed Computing) HadoopYahoo Doug-Cutting NDFS Hadoop Distributed File System (HDFS)

77Hadoop Vast Amounts of Data Cost EfficiencyPC Parallel Performance Robustness

88Google vs. HadoopGoogle ApacheGoogleYahoo, Amazonopen documentopen sourceMapReduceHadoop MapReduceGFSHDFSBigtableHbaseGoogle NutchLinuxLinux / GPL99HadoopHadoopHDFS (Hadoop Distributed File System)HBase1010Hadoop(1/3)Hadoop CoreHDFSMapReduceHBasePigChukwaHiveAvroZooKeeperHadoop1111Hadoop(2/3)CoreAvroRPCMapReduceHDFSPig

1212Hadoop(3/3)HBase (row) ZooKeeperHiveHDFSSQLChukwa

1313HadoopHadoopHDFS (Hadoop Distributed File System)HBase1414HDFS?Hadoop Distributed File SystemGoogle File System

Hadoop Distributed File System (HDFS)MapReduceHbaseA Cluster of MachinesCloud Applications1515HDFS(1/2) Fault Tolerance Streaming data accessThroughputLatency Large data sets and filesPetabytes Coherency Model1616HDFS(2/2) Data Locality > Heterogeneous

1717HDFS NameNodeHDFS (namespace)metadataBlocks DataNodeBlockBlock1818HDFS1919HDFSClient(1)(2)metadata(3)2020HDFSDateNodeDateNode2: get block locationNameNodeclient NodeDateNode5: read()4: read()DistributedFileSystemHDFSClient1: open() 3: read() 6: close()FSData InputStreamclient JVM2121HDFSDateNodeDateNode4 4 5 5 2: create fileNameNode7: completeclient NodeDateNode5: ack packet4: write packetDistributedFileSystemHDFSClient1: create()3: write()6: close()FSData InputStreamclient JVM2222HadoopHadoopHDFS (Hadoop Distributed File System)HBase2323HBase?HBase (column-oriented) 2008HadoopApacheHBaseHadoop Distributed File System (HDFS)MapReduceHbaseA Cluster of MachinesCloud Applications2424Hbase2525HBaseAdobe (Structure data)Kalooga http://www.kalooga.com/Meetup http://www.meetup.com/StreamyMigrate from MySQL to Hbase http://www.streamy.com/Trend Micro http://trendmicro.com/Yahoo! fingerprint http://www.yahoo.com/More - http://wiki.apache.org/hadoop/Hbase/PoweredBy2626HBase(1/2)HMasterHRegionserver slavesHRegionserver slaves HRegionServer Hregionserver HRegionServerHRegionserver slavesClient (//)HRegions HMasterHRegionServer () HRegions2727HBase(2/2)ZooKeeperHBase (column families)HRegionHRegionServerHBase (failover) 2828HBase

2929Hbase Data Model

3030Example

Conceptual ViewPhysical Storage View3131Hbase-ROOT-ZooKeeper.META.0RegionRegion1RegionRegionnRegionRegionRegionRegionRegionRegionRegion-ROOT-.META.RegionRegion3232