tez on emrを試してみた

40
Developer Day Tez on EMRを試してみた 1 G1 クラスメソッド株式会社 能登 諭 Ⓒ Classmethod, Inc. 20150329 #cmdevio2015G

Upload: -

Post on 16-Jul-2015

769 views

Category:

Software


0 download

TRANSCRIPT

  • Developer Day

    Tez on EMR

    1

    G-1

    Classmethod, Inc.

    20150329

    #cmdevio2015G

  • Classmethod, Inc.

    ! Twitter@n3104! Hadoop! AWSEMR

    2

  • Classmethod, Inc.

    EMR Hive Tez Tez on EMR

    3

  • EMR

    4

  • Classmethod, Inc.

    Amazon Elastic MapReduce http://aws.amazon.com/jp/elasticmapreduce

    AWSHadoop

    5

    http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-what-is-emr.html

  • Classmethod, Inc.

    Apache Hadoop http://hadoop.apache.org/ HDFSMapReduce

    Hadoop2YARN

    6

  • Classmethod, Inc.

    Hadoop Distributed File System Master-Slave

    7

    https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html

  • Classmethod, Inc.

    MapReduce MapReduce

    Data node

    8

    http://insights.wired.com/proles/blogs/a-beginner-s-guide-to-cloud-apache-hadoop-and-mapreduce-paradigm

  • Classmethod, Inc.

    EMR S3 AWS

    9

    http://aws.amazon.com/jp/elasticmapreduce/details/

  • Classmethod, Inc.

    Hadoop Master node

    AWS

    10

  • Hive

    11

  • Classmethod, Inc.

    Apache Hive https://hive.apache.org/ HiveQLSQLMapReduce

    12

    https://cwiki.apache.org/conuence/display/Hive/Design

  • Classmethod, Inc.

    HiveQLMRhive> EXPLAIN SELECT COUNT(*) FROM impressions; OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage !STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: impressions ... ! Reduce Operator Tree: Group By Operator aggregations: count(VALUE._col0) mode: mergepartial outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE Select Operator ... ! Stage: Stage-0 Fetch Operator limit: -1

    13

  • Classmethod, Inc.

    SQL on Hadoop

    14

    http://datadotz.com/sql-on-hadoop/

  • Classmethod, Inc.

    15

    2013/05 https://amplab.cs.berkeley.edu/benchmark/v1/

    2014/01http://hortonworks.com/blog/benchmarking-apache-

    hive-13-enterprise-hadoop/

    2014/01http://blog.cloudera.com/blog/2014/01/impala-

    performance-dbms-class-speed

    2014/02 https://amplab.cs.berkeley.edu/benchmark/

    2014/05http://blog.cloudera.com/blog/2014/05/new-sql-choices-in-the-apache-hadoop-ecosystem-why-impala-continues-

    to-lead/2014/09

    http://blog.cloudera.com/blog/2014/09/new-benchmarks-for-sql-on-hadoop-impala-1-4-widens-the-

    performance-gap/2014/11

    http://www.slideshare.net/hadoopxnttdata/sql-on-hadoop-201411

  • Classmethod, Inc.

    SequenceFile Avro RCFile Parquet

    16

  • Classmethod, Inc.

    GZIP BZIP2 LZO Snappy

    17

  • Classmethod, Inc.

    Hive

    Spark SQL Impala Presto Drill

    18

  • Classmethod, Inc.

    EMR

    19

    Hive Applicationhttp://docs.aws.amazon.com/ja_jp/

    ElasticMapReduce/latest/DeveloperGuide/emr-hive.html

    AMI Version 3.6.0Hive 0.13.1

    Impala Applicationhttp://docs.aws.amazon.com/ja_jp/

    ElasticMapReduce/latest/DeveloperGuide/emr-impala.html

    AMI Version 3.6.0Impala 1.2.4

    SparkBootstrap Action

    https://github.com/awslabs/emr-bootstrap-actions/tree/master/spark

    Spark 1.3.0 (1.3.0.d) with EMR AMI 3.5.x and later

    PrestoBootstrap Action

    https://github.com/awslabs/emr-bootstrap-actions/tree/master/presto

    DrillBootstrap Action

    https://github.com/awslabs/emr-bootstrap-actions/tree/master/drill

    PhoenixBootstrap Action

    https://github.com/awslabs/emr-bootstrap-actions/tree/master/phoenix

    TezBootstrap Action

    https://forums.aws.amazon.com/thread.jspa?threadID=170560

    GitHub

  • Tez

    20

  • Classmethod, Inc.

    Apache Tez http://tez.apache.org/ YARNDAGdirected-acyclic-graph

    21

    http://www.slideshare.net/ydn/tez-bikas/2

  • Classmethod, Inc.

    DAG API DAGAPI

    22

    http://hortonworks.com/blog/expressing-data-processing-in-apache-tez/

  • Classmethod, Inc.

    MapReduce MapReduceMapReduce

    23

    http://hortonworks.com/blog/apache-hadoop-2-is-ga/

  • Classmethod, Inc.

    YARN Application

    HDFS

    24

    !!

    Client Machine

    !!

    Node Manager

    TezTask !!Node

    Manager

    TezTaskTezClient

    HDFS Tez Lib 1 Tez Lib 2

    !!

    Client Machine

    TezClient

    http://www.slideshare.net/ydn/tez-bikas/8

  • Classmethod, Inc.

    Hive on Tez https://cwiki.apache.org/conuence/display/Hive/Hive+on+Tez

    HiveMRTez

    25

    http://tm.durusau.net/?p=48476

  • Classmethod, Inc.

    Hive on MR DAG

    26

    http://www.slideshare.net/Hadoop_Summit/w-235phall1pandey/9

  • Tez on EMR

    27

  • Classmethod, Inc.

    AMI Version 3.6.0 Hadoop 2.4.0 Hive 0.13.1 Tez 0.4.1-incubating !

    Tez 0.5Hive 0.14Tez 0.4.1

    28

  • Classmethod, Inc.

    EMR Master NodeSSH

    hadoop

    29

  • Classmethod, Inc.

    1# TezMavenProtocol Buers sudo wget http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo -O /etc/yum.repos.d/epel-apache-maven.repo sudo sed -i s/\$releasever/6/g /etc/yum.repos.d/epel-apache-maven.repo sudo yum install -y apache-maven sudo yum install -y protobuf-devel !# Tez wget http://archive.apache.org/dist/incubator/tez/tez-0.4.1-incubating/tez-0.4.1-incubating-src.tar.gz tar xzf tez-0.4.1-incubating-src.tar.gz cd tez-0.4.1-incubating-src mvn clean package -DskipTests=true -Dmaven.javadoc.skip=true

    30

  • Classmethod, Inc.

    2# mapred-site.xmlHive cd sed -i -e 's:mapreduce.framework.nameyarn:mapreduce.framework.nameyarn-tez:g' conf/mapred-site.xml !# tez mkdir tez tez/conf tez/jars cp -r tez-0.4.1-incubating-src/tez-dist/target/tez-0.4.1-incubating/tez-0.4.1-incubating/* tez/jars !# HDFSTez hadoop fs -mkdir -p /apps/tez-0.4.1 hadoop fs -copyFromLocal tez/jars/* /apps/tez-0.4.1/ hadoop fs -copyFromLocal share/hadoop/mapreduce /apps/tez-0.4.1/

    31

  • Classmethod, Inc.

    3# Tez cat > tez/conf/tez-site.xml tez.lib.uris${fs.defaultFS}/apps/tez-0.4.1/,${fs.defaultFS}/apps/tez-0.4.1/lib/,${fs.defaultFS}/apps/tez-0.4.1/mapreduce/,${fs.defaultFS}/apps/tez-0.4.1/mapreduce/lib/ tez.use.cluster.hadoop-libstrue EOF !# hiveJars hadoop fs -mkdir ./ !# TezHive TEZ_CONF_DIR=/home/hadoop/tez/conf/ TEZ_JARS=/home/hadoop/tez/jars/ export HADOOP_CLASSPATH=${TEZ_CONF_DIR}:${TEZ_JARS}/*:${TEZ_JARS}/lib/*

    32

  • Classmethod, Inc.

    Hivehive hive> SET hive.execution.engine=tez;

    33

  • Classmethod, Inc.

    CloudFront

    : 112,019 EMR

    Master: 1m3.xlarge Core: 2m3.xlarge

    Hive on MR: 220 Hive on Tez: 18

    34

  • Classmethod, Inc.

    CREATE EXTERNAL TABLE IF NOT EXISTS impressions ( dt STRING, tm STRING, xedgelocation STRING, scbytes INT, cip STRING, csmethod STRING, cshost STRING, csuristem STRING, csstatus STRING, csreferrer STRING, csuseragent STRING, csuriquery STRING, cscookie STRING, edgeresulttype STRING, edgerequestid STRING, hostheader STRING, protocol STRING, csbytes INT ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION 's3://cf-logs/';

    35

  • Classmethod, Inc.

    SQLSELECT dt, tm, COUNT(*) AS rps FROM `impressions` GROUP BY dt, tm ORDER BY rps DESC LIMIT 10;

    36

  • Classmethod, Inc.

    Hive on MR mapreduce.framework.name=yarnyarn-tezMRTez

    37

    MR

    MR

    YARN

    MR

    Tez

    Tez

    mapreduce.framework.name

    Hive Hive Hive

    hive.execution.engine

  • Classmethod, Inc.

    tez.use.cluster.hadoop-libs=true

    HadoopEMR

    tez.lib.uris/home/hadoop/share/hadoop/mapreduce yarn.application.classpath/home/hadoop/share/hadoop/mapreduceHDFS

    hiveJarsHDFS https://cwiki.apache.org/conuence/display/Hive/Conguration+Properties#CongurationProperties-hive.jar.directory

    http://tez.apache.org/install_pre_0_5_0.html

    38

  • Classmethod, Inc.

    EMRTezMR

    EMRTez

    39

  • Developer Day

    40

    G-1

    Classmethod, Inc.

    #cmdevio2015G