「新製品 kudu 及び recordserviceの概要」 #cwt2015
TRANSCRIPT
-
1 Cloudera, Inc. All rights reserved.
Kudu RecordService +
Amr Awadallah | Cloudera CTOTwitter: @awadallah
-
2 Cloudera, Inc. All rights reserved.
: =
HDFS
Hive
A
Hive
... ...
RecordService
MapReduce()
RecordService()
Impala
Sentry()
Sentry()
...Impala()
HDFS HDFS
Spark MRSpark()
-
3 Cloudera, Inc. All rights reserved.
Kudo ()
-
4 Cloudera, Inc. All rights reserved.
Kudu ?
-
5 Cloudera, Inc. All rights reserved.
Hadoop
HDFS :
HBase :
Hadoop
-
6 Cloudera, Inc. All rights reserved.
CPURAM
HBase 10~100
I/O
Expressive
Kudu
-
7 Cloudera, Inc. All rights reserved.
HDD SSD
NAND : iops: 450k (read), 250k (write), : 2GB/sec (read), 1.5GB/sec (write) $3/GB
3D XPoint memory (NAND 1,000RAM)
RAM
64 128 256GB
1 : CPU
CPU
2:
-
8 Cloudera, Inc. All rights reserved.
Kudu
Hadoop
Hadoop
ApacheASF
HDFS
NoSQLHBASE
SQOOP, FLUME, KAFKA
SENTRY
YARN
SQL
SPARK, HIVE, PIG
SPARK IMPALA SOLR SPARK HBASE
KUDU
-
9 Cloudera, Inc. All rights reserved.
Kudu
SQL
HBase/Cassandra : BOOL, INT8, INT16, INT32, INT64, FLOAT, DOUBLE, STRING, BINARY, TIMESTAMP
(possibly-composite primary key)
ALTER TABLE
Java C++ NoSQL API Insert(), Update(), Delete(), Scan()
MapReduce, Spark, and Impala
9
-
10 Cloudera, Inc. All rights reserved.
Kudu
SQL SQL Impala Spark
HDFS Hadoop
HDFS
HDFSHBase
Cloudera
-
11 Cloudera, Inc. All rights reserved.
Kudu
-
12 Cloudera, Inc. All rights reserved.
Kudu
Kudu READ/WRITE :
: : Insert, Update, Scan, Lookup
: : Insert, Scan, Lookup
: ODS : Insert, Update, Scan, Lookup
-
13 Cloudera, Inc. All rights reserved.
-
14 Cloudera, Inc. All rights reserved.
Hadoop
:
HBase
Parquet File
?
HBase Parquet
Parquet Impala
(
)
Impala on HDFS
-
15 Cloudera, Inc. All rights reserved.
KuduHadoop
Impala on Kudu
()
-
16 Cloudera, Inc. All rights reserved.
-
17 Cloudera, Inc. All rights reserved.
Kudu
(typed)
:
(
Paxos (Raft)
/
-
18 Cloudera, Inc. All rights reserved.
100
Raft N3
MTTR
HDFS
18
-
19 Cloudera, Inc. All rights reserved.
-
20 Cloudera, Inc. All rights reserved.
Client
Meta Cache [email protected] T(Master)
-
21 Cloudera, Inc. All rights reserved.
[email protected] T(Master)
{Z,Y,X} 2:T1,T2,T3, ...
-
22 Cloudera, Inc. All rights reserved.
[email protected] T(Master)
{Z,Y,X} 2:T1,T2,T3, ...
T1: T2: T3:
-
23 Cloudera, Inc. All rights reserved.
UPDATE [email protected] SET
T1: T2: T3:
[email protected] T(Master)
{Z,Y,X} 2:T1,T2,T3, ...
-
24 Cloudera, Inc. All rights reserved.
InsertHBase memstore
Apache Parquet
MVCC : in-place
SELECT AS OF
current time READ
(predicate evaluation)
24
-
25 Cloudera, Inc. All rights reserved.
(Replicated master) * (META ) TS
RAM 80GetTableLocations RPC :
99:68us99.99:657us CPU 2%
25
-
26 Cloudera, Inc. All rights reserved.
Kudu
HBase Kudu Bloom
1 READ : 1 (Column groups)
-
27 Cloudera, Inc. All rights reserved.
-
28 Cloudera, Inc. All rights reserved.
TPC-H
75TS + 1
12 RAM
Kudu 0.5.0Impala 2.2 with Kudu supportCDH 5.4
TPC-H Scale Factor 100 (100GB)
: SELECT n_name, sum(l_extendedprice * (1 - l_discount)) as revenue FROM customer, orders, lineitem, supplier, nation, region WHERE c_custkey = o_custkey AND l_orderkey = o_orderkey AND l_suppkey = s_suppkey AND c_nationkey = s_nationkey AND s_nationkey = n_nationkey AND n_regionkey = r_regionkey AND r_name = 'ASIA' AND o_orderdate >= date '1994-01-01' AND o_orderdate < '1995-01-01 GROUP BY n_name ORDER BY revenue desc;
28
-
29 Cloudera, Inc. All rights reserved.
- RAMKudu Parquet 31% - HDD I/OParquet Kudu
-
30 Cloudera, Inc. All rights reserved.
Apache Phoenix 10 9 1 ) HBase 1.0Phoenix 4.3 TPC-H LINEITEM ()
30
2152
219 76
131
0.04
1918
13.2
1.7
0.7
0.15
155
9.3
1.4 1.5 1.37
0.01
0.1
1
10
100
1000
10000 Load TPCH Q1 COUNT(*)
COUNT(*) WHERE
single-row lookup
()
Phoenix
Kudu
Parquet
-
31 Cloudera, Inc. All rights reserved.
NoSQL (YCSB)
YCSB 0.5.0- 10 (9 1 )
HBase 1.0 ops
31
-
32 Cloudera, Inc. All rights reserved.
Xiaomi
RPC
WRITE
1 50
-
33 Cloudera, Inc. All rights reserved.
Kudu
1~1
1~3
-
34 Cloudera, Inc. All rights reserved.
Kudu
ETL (0~10) ETL
ETL
OLAP
-
35 Cloudera, Inc. All rights reserved.
:
71
CPU: E5-2620 2.1GHz * 24 core Memory: 64GB : 1Gb : 12 HDD
Hadoop2.6/Impala 2.1/Kudu
1 26 270 byte/ :17, :5
-
36 Cloudera, Inc. All rights reserved.
1:
1.4 2.0 2.3 3.1
1.3 0.9 1.3 2.8
4.0 5.7
7.5
16.7
Q1 Q2 Q3 Q4 Q5 Q6
kudu
parquet
(s)
Kudu 961.1 2.8M record/s 39.5k record/s
Parquet 114.6 23.5M record/s 331k records/s
Impala (INSERT INTO):
:
* HDFS Parquet = 3kudu = 3* 5
-
37 Cloudera, Inc. All rights reserved.
-
38 Cloudera, Inc. All rights reserved.
Java C++ API
ImpalaMapReduce Spark
SSD HDD
-
39 Cloudera, Inc. All rights reserved.
:
VM:getkudu.io
:getkudu.io/kudu.pdf
:
:github.com/cloudera/kudu ()
gerrit.cloudera.org ()issues.cloudera.org (JIRA 2013)
-
40 Cloudera, Inc. All rights reserved.
-
41 Cloudera, Inc. All rights reserved.
Appendix
-
42 Cloudera, Inc. All rights reserved.
Fault tolerance
Transient FOLLOWER failure: Leader can s]ll achieve majority Restart follower TS within 5 min and it will rejoin transparently
Transient LEADER failure: Followers expect to hear a heartbeat from their leader every 1.5 seconds 3 missed heartbeats: leader elec]on! New LEADER is elected from remaining nodes within a few seconds
Restart within 5 min and it rejoins as a FOLLOWER N replicas handle (N-1)/2 failures
42
-
43 Cloudera, Inc. All rights reserved.
Fault tolerance (2)
Permanent failure: Leader no]ces that a follower has been dead for 5 minutes Evicts that follower Master selects a new replica Leader copies the data over to the new one, which joins as a new FOLLOWER
43
-
44 Cloudera, Inc. All rights reserved.
LSM vs Kudu
LSM Log Structured Merge (Cassandra, HBase, etc) Inserts and updates all go to an in-memory map (MemStore) and later flush to on-disk files (HFile/SSTable) Reads perform an on-the-fly merge of all on-disk HFiles
Kudu Shares some traits (memstores, compac]ons) More complex. Slower writes in exchange for faster reads (especially scans)
44
-
45 Cloudera, Inc. All rights reserved.
Kudu storage Compac]on policy
Solves an op]miza]on problem (knapsack problem) Minimize height of rowsets for the average key lookup Bound on number of seeks for write or random-read
Restrict total IO of any compac]on to a budget (128MB) No long compac7ons, ever No minor vs major dis7nc7on Always be compac]ng or flushing Low IO priority maintenance threads
45