hadoopビッグデータ基盤の歴史を振り返る #cwt2015

47
1 © Cloudera, Inc. All rights reserved. Hadoopビッグデータ基盤の 歴史を振り返る

Upload: cloudera-japan

Post on 22-Jan-2018

4.774 views

Category:

Technology


1 download

TRANSCRIPT

  • 1 Cloudera, Inc. All rights reserved.

    Hadoop

  • 2 Cloudera, Inc. All rights reserved.

    20114ClouderaCloudera

    email: [email protected] twitter: @shiumachi

  • 6 Cloudera, Inc. All rights reserved.

    (EDH)

    1

    Sqoop, Flume

    MapReduce, Hive,

    Pig, Spark

    Impala

    Solr

    SAS, R, Spark,

    Mahout

    NoSQL

    HBase

    Spark Streaming

    HDFS, HBase

    YARN, Cloudera Manager,Cloudera Navigator

  • 7 Cloudera, Inc. All rights reserved.

  • 8 Cloudera, Inc. All rights reserved.

    DISCLAIMERHadoopEDHDWH

    ClouderaCloudera

    (HA)

    :

  • 10 Cloudera, Inc. All rights reserved.

  • 12 Cloudera, Inc. All rights reserved.

    (1)

  • 13 Cloudera, Inc. All rights reserved.

    (1)

    tar.gz

  • 14 Cloudera, Inc. All rights reserved.

    (1)

    tar.gz

    HDFS

    tar.gz

    HFDS

    put

  • 15 Cloudera, Inc. All rights reserved.

    (1)

    tar.gz

    HDFS

    tar.gz

    HFDS

    put

    HDFS

    Avro

    MapReduce

  • 16 Cloudera, Inc. All rights reserved.

    (1)

    tar.gz

    HDFS

    tar.gz

    HFDS

    put

    HDFS

    Avro

    MapReduce

    Hive

    RCFile

    HDFS

    RCFile

  • 17 Cloudera, Inc. All rights reserved.

    (1)

    tar.gz

    HDFS

    tar.gz

    HFDS

    put

    HDFS

    Avro

    MapReduce

    Hive

    RCFile

    HDFS

    RCFile

    Hive

  • 18 Cloudera, Inc. All rights reserved.

    (1)

    tar.gz

    HDFS

    tar.gz

    HFDS

    put

    HDFS

    Avro

    MapReduce

    Hive

    RCFile

    HDFS

    RCFile

    Hive

    Flume Source

  • 19 Cloudera, Inc. All rights reserved.

    (1)

    tar.gz

    HDFS

    tar.gz

    HFDS

    put

    HDFS

    Avro

    MapReduce

    Hive

    RCFile

    HDFS

    RCFile

    Hive

    Flume Source

    Flume Sink

    HDFS Sink

    HDFS

    SequenceFile

  • 20 Cloudera, Inc. All rights reserved.

    2009-2012Hadoop = MapReduceJavaHivePig

    HiveBI

  • 21 Cloudera, Inc. All rights reserved.

    MapReduceHadoop

    HDFS2012

    Hive SQLMapReducePig

    Avro

    RCFile Parquet

    FlumeSource - Channel - Sink 3Source Sink

  • 22 Cloudera, Inc. All rights reserved.

    (2)BI

    tar.gz

    HDFS

    tar.gz

    HFDS

    put

    HDFS

    Avro

    MapReduce

    Hive

    RCFile

    HDFS

    RCFile

    Hive

    Flume Source

    Flume Sink

    HDFS Sink

    HDFS

    SequenceFile

  • 23 Cloudera, Inc. All rights reserved.

    (2)BI

    tar.gz

    HDFS

    tar.gz

    HFDS

    put

    HDFS

    Avro

    MapReduce

    Hive

    RCFile

    HDFS

    RCFile

    Hive

    Flume Source

    Flume Sink

    HDFS Sink

    HDFS

    SequenceFile

    Hive

    Parquet

    HDFS

    Parquet

  • 24 Cloudera, Inc. All rights reserved.

    (2)BI

    tar.gz

    HDFS

    tar.gz

    HFDS

    put

    HDFS

    Avro

    MapReduce

    Hive

    RCFile

    HDFS

    RCFile

    Hive

    Flume Source

    Flume Sink

    HDFS Sink

    HDFS

    SequenceFile

    Hive

    Parquet

    HDFS

    Parquet

    HBase

    HBase

    HBase

    get/put API

  • 25 Cloudera, Inc. All rights reserved.

    (2)BI

    tar.gz

    HDFS

    tar.gz

    HFDS

    put

    HDFS

    Avro

    MapReduce

    Hive

    RCFile

    HDFS

    RCFile

    Hive

    Flume Source

    Flume Sink

    HDFS Sink

    HDFS

    SequenceFile

    Hive

    Parquet

    HDFS

    Parquet

    HBase

    HBase

    HBase

    get/put API

    Impala

    BI

  • 26 Cloudera, Inc. All rights reserved.

    (2)BI2012 ImpalaBIHadoopParquetHBaseBIHBase + Parquet

    (HBase)

  • 27 Cloudera, Inc. All rights reserved.

    BI Impala201210HadoopSQLHadoopMapReduce

    ParquetClouderaTwitter

    HBaseNoSQLHBase2009Impala

  • 28 Cloudera, Inc. All rights reserved.

    Parquet

    HBase

    Parquet + HBase

    (Parquet)

    (HBase)

    (HBase)

    (HBase)

  • 29 Cloudera, Inc. All rights reserved.

    (3)SparkEDH

    tar.gz

    HDFS

    tar.gz

    HFDS

    put

    HDFS

    Avro

    MapReduce

    Hive

    RCFile

    HDFS

    RCFile

    Hive

    Flume Source

    Flume Sink

    HDFS Sink

    HDFS

    SequenceFile

    Hive

    Parquet

    HDFS

    Parquet

    HBase

    HBase

    HBase

    get/put API

    Impala

    BI

  • 30 Cloudera, Inc. All rights reserved.

    (3)SparkEDH

    tar.gz

    HDFS

    tar.gz

    HFDS

    put

    HDFS

    Avro

    MapReduce

    Hive

    RCFile

    HDFS

    RCFile

    Hive

    Flume Source

    Flume Sink

    HDFS Sink

    HDFS

    SequenceFile

    Hive

    Parquet

    HDFS

    Parquet

    HBase

    HBase

    HBase

    get/put API

    Impala

    BI

    Solr

    Flume Sink

    NRT

  • 31 Cloudera, Inc. All rights reserved.

    (3)SparkEDH

    tar.gz

    HDFS

    tar.gz

    HFDS

    put

    HDFS

    Avro

    MapReduce

    Hive

    RCFile

    HDFS

    RCFile

    Hive

    Flume Source

    Flume Sink

    HDFS Sink

    HDFS

    SequenceFile

    Hive

    Parquet

    HDFS

    Parquet

    HBase

    HBase

    HBase

    get/put API

    Impala

    BI

    Solr

    Flume Sink

    NRT

    Lily HBase Indexer

    NRT

  • 32 Cloudera, Inc. All rights reserved.

    (3)SparkEDH

    tar.gz

    HDFS

    tar.gz

    HFDS

    put

    HDFS

    Avro

    MapReduce

    Hive

    RCFile

    HDFS

    RCFile

    Hive

    Flume Source

    Flume Sink

    HDFS Sink

    HDFS

    SequenceFile

    Hive

    Parquet

    HDFS

    Parquet

    HBase

    HBase

    HBase

    get/put API

    Impala

    BI

    Solr

    Flume Sink

    NRT

    Lily HBase Indexer

    NRT

    Solr

  • 33 Cloudera, Inc. All rights reserved.

    (3)SparkEDH

    tar.gz

    HDFS

    tar.gz

    HFDS

    put

    HDFS

    Avro

    MapReduce

    Hive

    RCFile

    HDFS

    RCFile

    Hive

    Flume Source

    Flume Sink

    HDFS Sink

    HDFS

    SequenceFile

    Hive

    Parquet

    HDFS

    Parquet

    HBase

    HBase

    HBase

    get/put API

    Impala

    BI

    Solr

    Flume Sink

    NRT

    Lily HBase Indexer

    NRT

    Solr

    Spark

  • 34 Cloudera, Inc. All rights reserved.

    (3)SparkEDH2013 Cloudera SearchHadoopSparkSQL

  • 35 Cloudera, Inc. All rights reserved.

    SparkEDHSolrOSSSolrHadoopClouderaSolrCloudera Search OSS

    Lily HBase IndexerHBaseSolr

    Spark MapReduceAPI

  • 36 Cloudera, Inc. All rights reserved.

    (4)Kafka

    tar.gz

    HDFS

    tar.gz

    HFDS

    put

    HDFS

    Avro

    MapReduce

    Hive

    RCFile

    HDFS

    RCFile

    Hive

    Flume Source

    Flume Sink

    HDFS Sink

    HDFS

    SequenceFile

    Hive

    Parquet

    HDFS

    Parquet

    HBase

    HBase

    HBase

    get/put API

    Impala

    BI

    Solr

    Flume Sink

    NRT

    Lily HBase Indexer

    NRT

    Solr

    Spark

  • 37 Cloudera, Inc. All rights reserved.

    (4)Kafka

    tar.gz

    HDFS

    tar.gz

    HFDS

    put

    HDFS

    Avro

    MapReduce

    Hive

    RCFile

    HDFS

    RCFile

    Hive

    Flume Source

    Flume Sink

    HDFS Sink

    HDFS

    SequenceFile

    Hive

    Parquet

    HDFS

    Parquet

    HBase

    HBase

    HBase

    get/put API

    Impala

    BI

    Solr

    Flume Sink

    NRT

    Lily HBase Indexer

    NRT

    Solr

    Kafka Broker

    Flume Source

    Kafka Source

    Spark

  • 38 Cloudera, Inc. All rights reserved.

    (4)Kafka

    tar.gz

    HDFS

    tar.gz

    HFDS

    put

    HDFS

    Avro

    MapReduce

    Hive

    RCFile

    HDFS

    RCFile

    Hive

    Flume Source

    Flume Sink

    HDFS Sink

    HDFS

    SequenceFile

    Hive

    Parquet

    HDFS

    Parquet

    HBase

    HBase

    HBase

    get/put API

    Impala

    BI

    Solr

    Flume Sink

    NRT

    Lily HBase Indexer

    NRT

    Solr

    Kafka Broker

    Flume Source

    Kafka Source

    Kafka Producer

    Producer API

    Spark

  • 39 Cloudera, Inc. All rights reserved.

    (4)Kafka

    tar.gz

    HDFS

    tar.gz

    HFDS

    put

    HDFS

    Avro

    MapReduce

    Hive

    RCFile

    HDFS

    RCFile

    Hive

    Flume Source

    Flume Sink

    HDFS Sink

    HDFS

    SequenceFile

    Hive

    Parquet

    HDFS

    Parquet

    HBase

    HBase

    HBase

    get/put API

    Impala

    BI

    Solr

    Flume Sink

    NRT

    Lily HBase Indexer

    NRT

    Solr

    Kafka Broker

    Flume Source

    Kafka Source

    Kafka Producer

    Producer API

    Flume Sink

    HBase Sink

    Spark

  • 40 Cloudera, Inc. All rights reserved.

    (4)Kafka

    tar.gz

    HDFS

    tar.gz

    HFDS

    put

    HDFS

    Avro

    MapReduce

    Hive

    RCFile

    HDFS

    RCFile

    Hive

    Flume Source

    Flume Sink

    HDFS Sink

    HDFS

    SequenceFile

    Hive

    Parquet

    HDFS

    Parquet

    HBase

    HBase

    HBase

    get/put API

    Impala

    BI

    Solr

    Flume Sink

    NRT

    Lily HBase Indexer

    NRT

    Solr

    Kafka Broker

    Flume Source

    Kafka Source

    Kafka Producer

    Producer API

    Flume Sink

    HBase Sink

    Spark Streaming

    Spark

  • 41 Cloudera, Inc. All rights reserved.

    (4)Kafka2015 KafkaSpark Streaming end-to-end

  • 42 Cloudera, Inc. All rights reserved.

    KafkaKafkaFlumeKafka1

    Spark StreamingSpark

  • 43 Cloudera, Inc. All rights reserved.

    SLA

  • 44 Cloudera, Inc. All rights reserved.

    SLA1SLA()

  • 45 Cloudera, Inc. All rights reserved.

    SLA1: SLAImpala51

    2: SLAHadoopHadoopHadoop

    3: SLA

  • 46 Cloudera, Inc. All rights reserved.

    end-to-endSLA

    Hadoop

    SLA

    ImpalaParquetFlume(Parquet)

    Impala

    HBase()

    Impala()

    Impala

    HadoopHadoop

    Hadoopend-to-end

  • 47 Cloudera, Inc. All rights reserved.

    ParquetImpala

    HBase

    SparkMapReduce

  • 48 Cloudera, Inc. All rights reserved.

  • 49 Cloudera, Inc. All rights reserved.

    tar.gz

    HDFS

    tar.gz

    HDFS

    Avro

    HBase

    HBase

    Solr

    Kafka Broker

    HFDS

    put

    MapReduce

    Hive

    RCFile

    HDFS

    RCFile

    Hive

    Flume Source

    Flume Sink

    HDFS Sink

    HDFS

    SequenceFile

    Hive

    Parquet

    HDFS

    Parquet

    Impala

    BI

    Flume Sink

    HBase Sink

    HBase

    get/put API

    Lily HBase Indexer

    NRT

    Spark Streaming

    Solr

    Flume Source

    Kafka Source

    Kafka Producer

    Producer API

    Spark

    Flume Sink

    NRT

  • 50 Cloudera, Inc. All rights reserved.

    SLASLAend-to-endSLA

    SLA

  • 51 Cloudera, Inc. All rights reserved.

    We are hiring!

    [email protected]

  • 52 Cloudera, Inc. All rights reserved.

    Thank you