Download - Tez on EMRを試してみた
-
Developer Day
Tez on EMR
1
G-1
Classmethod, Inc.
20150329
#cmdevio2015G
-
Classmethod, Inc.
! Twitter@n3104! Hadoop! AWSEMR
2
-
Classmethod, Inc.
EMR Hive Tez Tez on EMR
3
-
EMR
4
-
Classmethod, Inc.
Amazon Elastic MapReduce http://aws.amazon.com/jp/elasticmapreduce
AWSHadoop
5
http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-what-is-emr.html
-
Classmethod, Inc.
Apache Hadoop http://hadoop.apache.org/ HDFSMapReduce
Hadoop2YARN
6
-
Classmethod, Inc.
Hadoop Distributed File System Master-Slave
7
https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html
-
Classmethod, Inc.
MapReduce MapReduce
Data node
8
http://insights.wired.com/proles/blogs/a-beginner-s-guide-to-cloud-apache-hadoop-and-mapreduce-paradigm
-
Classmethod, Inc.
EMR S3 AWS
9
http://aws.amazon.com/jp/elasticmapreduce/details/
-
Classmethod, Inc.
Hadoop Master node
AWS
10
-
Hive
11
-
Classmethod, Inc.
Apache Hive https://hive.apache.org/ HiveQLSQLMapReduce
12
https://cwiki.apache.org/conuence/display/Hive/Design
-
Classmethod, Inc.
HiveQLMRhive> EXPLAIN SELECT COUNT(*) FROM impressions; OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage !STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: impressions ... ! Reduce Operator Tree: Group By Operator aggregations: count(VALUE._col0) mode: mergepartial outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE Select Operator ... ! Stage: Stage-0 Fetch Operator limit: -1
13
-
Classmethod, Inc.
SQL on Hadoop
14
http://datadotz.com/sql-on-hadoop/
-
Classmethod, Inc.
15
2013/05 https://amplab.cs.berkeley.edu/benchmark/v1/
2014/01http://hortonworks.com/blog/benchmarking-apache-
hive-13-enterprise-hadoop/
2014/01http://blog.cloudera.com/blog/2014/01/impala-
performance-dbms-class-speed
2014/02 https://amplab.cs.berkeley.edu/benchmark/
2014/05http://blog.cloudera.com/blog/2014/05/new-sql-choices-in-the-apache-hadoop-ecosystem-why-impala-continues-
to-lead/2014/09
http://blog.cloudera.com/blog/2014/09/new-benchmarks-for-sql-on-hadoop-impala-1-4-widens-the-
performance-gap/2014/11
http://www.slideshare.net/hadoopxnttdata/sql-on-hadoop-201411
-
Classmethod, Inc.
SequenceFile Avro RCFile Parquet
16
-
Classmethod, Inc.
GZIP BZIP2 LZO Snappy
17
-
Classmethod, Inc.
Hive
Spark SQL Impala Presto Drill
18
-
Classmethod, Inc.
EMR
19
Hive Applicationhttp://docs.aws.amazon.com/ja_jp/
ElasticMapReduce/latest/DeveloperGuide/emr-hive.html
AMI Version 3.6.0Hive 0.13.1
Impala Applicationhttp://docs.aws.amazon.com/ja_jp/
ElasticMapReduce/latest/DeveloperGuide/emr-impala.html
AMI Version 3.6.0Impala 1.2.4
SparkBootstrap Action
https://github.com/awslabs/emr-bootstrap-actions/tree/master/spark
Spark 1.3.0 (1.3.0.d) with EMR AMI 3.5.x and later
PrestoBootstrap Action
https://github.com/awslabs/emr-bootstrap-actions/tree/master/presto
DrillBootstrap Action
https://github.com/awslabs/emr-bootstrap-actions/tree/master/drill
PhoenixBootstrap Action
https://github.com/awslabs/emr-bootstrap-actions/tree/master/phoenix
TezBootstrap Action
https://forums.aws.amazon.com/thread.jspa?threadID=170560
GitHub
-
Tez
20
-
Classmethod, Inc.
Apache Tez http://tez.apache.org/ YARNDAGdirected-acyclic-graph
21
http://www.slideshare.net/ydn/tez-bikas/2
-
Classmethod, Inc.
DAG API DAGAPI
22
http://hortonworks.com/blog/expressing-data-processing-in-apache-tez/
-
Classmethod, Inc.
MapReduce MapReduceMapReduce
23
http://hortonworks.com/blog/apache-hadoop-2-is-ga/
-
Classmethod, Inc.
YARN Application
HDFS
24
!!
Client Machine
!!
Node Manager
TezTask !!Node
Manager
TezTaskTezClient
HDFS Tez Lib 1 Tez Lib 2
!!
Client Machine
TezClient
http://www.slideshare.net/ydn/tez-bikas/8
-
Classmethod, Inc.
Hive on Tez https://cwiki.apache.org/conuence/display/Hive/Hive+on+Tez
HiveMRTez
25
http://tm.durusau.net/?p=48476
-
Classmethod, Inc.
Hive on MR DAG
26
http://www.slideshare.net/Hadoop_Summit/w-235phall1pandey/9
-
Tez on EMR
27
-
Classmethod, Inc.
AMI Version 3.6.0 Hadoop 2.4.0 Hive 0.13.1 Tez 0.4.1-incubating !
Tez 0.5Hive 0.14Tez 0.4.1
28
-
Classmethod, Inc.
EMR Master NodeSSH
hadoop
29
-
Classmethod, Inc.
1# TezMavenProtocol Buers sudo wget http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo -O /etc/yum.repos.d/epel-apache-maven.repo sudo sed -i s/\$releasever/6/g /etc/yum.repos.d/epel-apache-maven.repo sudo yum install -y apache-maven sudo yum install -y protobuf-devel !# Tez wget http://archive.apache.org/dist/incubator/tez/tez-0.4.1-incubating/tez-0.4.1-incubating-src.tar.gz tar xzf tez-0.4.1-incubating-src.tar.gz cd tez-0.4.1-incubating-src mvn clean package -DskipTests=true -Dmaven.javadoc.skip=true
30
-
Classmethod, Inc.
2# mapred-site.xmlHive cd sed -i -e 's:mapreduce.framework.nameyarn:mapreduce.framework.nameyarn-tez:g' conf/mapred-site.xml !# tez mkdir tez tez/conf tez/jars cp -r tez-0.4.1-incubating-src/tez-dist/target/tez-0.4.1-incubating/tez-0.4.1-incubating/* tez/jars !# HDFSTez hadoop fs -mkdir -p /apps/tez-0.4.1 hadoop fs -copyFromLocal tez/jars/* /apps/tez-0.4.1/ hadoop fs -copyFromLocal share/hadoop/mapreduce /apps/tez-0.4.1/
31
-
Classmethod, Inc.
3# Tez cat > tez/conf/tez-site.xml tez.lib.uris${fs.defaultFS}/apps/tez-0.4.1/,${fs.defaultFS}/apps/tez-0.4.1/lib/,${fs.defaultFS}/apps/tez-0.4.1/mapreduce/,${fs.defaultFS}/apps/tez-0.4.1/mapreduce/lib/ tez.use.cluster.hadoop-libstrue EOF !# hiveJars hadoop fs -mkdir ./ !# TezHive TEZ_CONF_DIR=/home/hadoop/tez/conf/ TEZ_JARS=/home/hadoop/tez/jars/ export HADOOP_CLASSPATH=${TEZ_CONF_DIR}:${TEZ_JARS}/*:${TEZ_JARS}/lib/*
32
-
Classmethod, Inc.
Hivehive hive> SET hive.execution.engine=tez;
33
-
Classmethod, Inc.
CloudFront
: 112,019 EMR
Master: 1m3.xlarge Core: 2m3.xlarge
Hive on MR: 220 Hive on Tez: 18
34
-
Classmethod, Inc.
CREATE EXTERNAL TABLE IF NOT EXISTS impressions ( dt STRING, tm STRING, xedgelocation STRING, scbytes INT, cip STRING, csmethod STRING, cshost STRING, csuristem STRING, csstatus STRING, csreferrer STRING, csuseragent STRING, csuriquery STRING, cscookie STRING, edgeresulttype STRING, edgerequestid STRING, hostheader STRING, protocol STRING, csbytes INT ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION 's3://cf-logs/';
35
-
Classmethod, Inc.
SQLSELECT dt, tm, COUNT(*) AS rps FROM `impressions` GROUP BY dt, tm ORDER BY rps DESC LIMIT 10;
36
-
Classmethod, Inc.
Hive on MR mapreduce.framework.name=yarnyarn-tezMRTez
37
MR
MR
YARN
MR
Tez
Tez
mapreduce.framework.name
Hive Hive Hive
hive.execution.engine
-
Classmethod, Inc.
tez.use.cluster.hadoop-libs=true
HadoopEMR
tez.lib.uris/home/hadoop/share/hadoop/mapreduce yarn.application.classpath/home/hadoop/share/hadoop/mapreduceHDFS
hiveJarsHDFS https://cwiki.apache.org/conuence/display/Hive/Conguration+Properties#CongurationProperties-hive.jar.directory
http://tez.apache.org/install_pre_0_5_0.html
38
-
Classmethod, Inc.
EMRTezMR
EMRTez
39
-
Developer Day
40
G-1
Classmethod, Inc.
#cmdevio2015G