impala概要 道玄坂lt祭り 20150312 #dogenzakalt

28
1 © Cloudera, Inc. All rights reserved. Impala - Hadoopの最の 分析エンジン 嶋内 , Cloudera株式会社

Upload: cloudera-japan

Post on 15-Jul-2015

3.331 views

Category:

Technology


5 download

TRANSCRIPT

  • 1 Cloudera, Inc. All rights reserved.

    Impala - Hadoop , Cloudera

  • 2 Cloudera, Inc. All rights reserved.

    20114ClouderaCloudera

    email: [email protected] twitter: @shiumachi

  • 3 Cloudera, Inc. All rights reserved.

    Hadoop

    BISQL

    Hadoop

    Hadoop

  • 4 Cloudera, Inc. All rights reserved.

    BI /

    Sqoop, Flume

    MapReduce, Hive, Pig, Spark

    SAS, R, Spark,

    Mahout

    NoSQL HBase

    Spark

    Streaming

    Impala

    Solr

    HDFS, HBase

    YARN, Cloudera Manager,Cloudera Navigator

  • 5 Cloudera, Inc. All rights reserved.

    Cloudera Impala

    Hadoop MPP SQL http://impala.io/

    Cloudera / MapR / Amazon / Oracle HDFS HBase Hive

    ODBC / JDBC Kerberos / LDAP

  • 6 Cloudera, Inc. All rights reserved.

    Impala

    HDFS DN

    Query Exec Engine

    Query Coordinator

    Query Planner

    HBase HDFS DN

    Query Exec Engine

    Query Coordinator

    Query Planner

    HBase HDFS DN

    Query Exec Engine

    Query Coordinator

    Query Planner

    HBase

    ODBC / JDBC

    SQL App

    Hive Metastore HDFS NN State Store Catalogd

  • 7 Cloudera, Inc. All rights reserved.

    Impala 1.x Impala 1.0 (2013/04)

    SQL-92 () Hadoop

    ParquetAvroSequenceFile Kerberos ODBC / JDBC

    Impala 1.1 Apache Sentry RBAC(

    Impala 1.2 UDF / UDAF JOIN

    Impala 1.3 / CDH 5.0

    Impala 1.4 CDH 5.1 (2014/07) SQL (DECIMAL ORDER BY without LIMITetc.) HDFS

  • 8 Cloudera, Inc. All rights reserved.

    Impala 2.0 (2014/10)

    SQLSQL:2003 /(WHEREEXISTSIN)CHAR / VARCHARGRANT / REVOKE (Sentry )

    Hash Table disk join and aggregate tables

  • 9 Cloudera, Inc. All rights reserved.

    SQL-on-Hadoop (2014/09)

    Impala 1.4.0 Presto 0.74 Stinger phase 3 (Hive 0.13.0) Spark SQL 1.1

    TPC-DS Impala TPC-DS https://github.com/cloudera/impala-tpcds-kit

    SQL-92 JOIN Presto JVM

    http://blog.cloudera.com/blog/2014/09/new-benchmarks-for-sql-on-hadoop-impala-1-4-widens-the-performance-gap/

  • 10 Cloudera, Inc. All rights reserved.

    Impala :

  • 11 Cloudera, Inc. All rights reserved.

    Impala :

  • 12 Cloudera, Inc. All rights reserved.

  • 13 Cloudera, Inc. All rights reserved.

    /

    2.0

    RANK() / DENSE_RANK() FIRST_VALUE() / LAST_VALUE() LAG() / LEAD() ROW_NUMBER()

  • 14 Cloudera, Inc. All rights reserved.

    select stock_symbol, closing_date, closing_price,! lag(closing_price,1) over (partition by stock_symbol order by closing_date) as "yesterday closing"! from stock_ticker! order by closing_date;!+--------------+---------------------+---------------+-------------------+!| stock_symbol | closing_date | closing_price | yesterday closing |!+--------------+---------------------+---------------+-------------------+!| JDR | 2014-09-13 00:00:00 | 12.86 | NULL |!| JDR | 2014-09-14 00:00:00 | 12.89 | 12.86 |!| JDR | 2014-09-15 00:00:00 | 12.94 | 12.89 |!| JDR | 2014-09-16 00:00:00 | 12.55 | 12.94 |!| JDR | 2014-09-17 00:00:00 | 14.03 | 12.55 |!| JDR | 2014-09-18 00:00:00 | 14.75 | 14.03 |!| JDR | 2014-09-19 00:00:00 | 13.98 | 14.75 |!+--------------+---------------------+---------------+-------------------+!

  • 15 Cloudera, Inc. All rights reserved.

    HBase Impala HBase SELECT INSERT

    ImpalaHBase

    HBase : WebPVSNS

    ()HBase :

    1 INSERT VALUES

    Impala HBase external systems

    put SELECT * FROM hbase_tbl

    INSERT / INSERT VALUES get, scan

  • 16 Cloudera, Inc. All rights reserved.

    impalad

    SPOF

  • 17 Cloudera, Inc. All rights reserved.

    2 Cloudera Manager fair-scheduler.xml llama-site.xml

  • 18 Cloudera, Inc. All rights reserved.

    100 10

    10 1

    1000 GB

    100 GB

    Group A

    Group B

  • 19 Cloudera, Inc. All rights reserved.

    Hue Web UI (CDH)

  • 20 Cloudera, Inc. All rights reserved.

    JDBC / ODBC BI

    MicroStrategy, QlikViewSASTableau

    : https://zoomdata.zendesk.com/hc/en-us/articles/203813488-Date-and-Time-Formats-Supported-By-Zoomdata

  • 21 Cloudera, Inc. All rights reserved.

    Impala ()

    http://demo.gethue.com/ Quick Start VM (VM)

    http://www.cloudera.com/content/cloudera/en/downloads/quickstart_vms/cdh-5-3-x.html Cloudera Live

    (14)4 TableauZoomData http://www.cloudera.com/content/cloudera/en/products-and-services/cloudera-live.html

    Cloudera Director AWS http://www.cloudera.com/content/cloudera/en/downloads/cloudera-director/1-1-0.html

    Amazon EMR http://docs.aws.amazon.com/ja_jp/ElasticMapReduce/latest/DeveloperGuide/emr-impala.html

  • 22 Cloudera, Inc. All rights reserved.

    Thank you

  • 23 Cloudera, Inc. All rights reserved.

  • 24 Cloudera, Inc. All rights reserved.

    Impala

    http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/impala_cluster_sizing.html

    : CPU1264GB2TB HDD x 121015TB2020

  • 25 Cloudera, Inc. All rights reserved.

    Impala

    :

    10http://www.slideshare.net/cloudera/the-impala-cookbook-42530186

    Parquet read-once SequenceFile + Snappy

  • 26 Cloudera, Inc. All rights reserved.

  • 27 Cloudera, Inc. All rights reserved.

    http://www.cidrdb.org/cidr2015/Papers/CIDR15_Paper28.pdf

  • 28 Cloudera, Inc. All rights reserved.

    http://www.vldb.org/pvldb/vol7/p1295-floratou.pdf