하이브 최적화 방안

하이브 최적화 방안최종욱 [email protected]

mailto:[email protected]

발표에 앞서…

• 스팅어 소개를 간단히

스팅어는“차세대 하이브 개발자들”

모든 하둡은스팅어의 하이브를 기본 탑재

3단계 2단계 1단계

모든 하둡 분석 도구들은하이브를 기본 지원

파워 BI, 엑셀로 하이브 질의 자동 생성

SQL 질의를 하이브로 자동 변환

기존 다이어그램을 하이브로 자동 변환

탐색기, 대시보드 등에서 활용

하이브 버전별 최적화 요소버전 분류 세부 관련 용어

~0.10 처리 성능 디렉토리 구조 파티션, 버킷

0.11 처리 성능 + 대용량 조인 SMB 조인

0.12 저장 용량 + 자료 압축 ORC, ZLIB

0.13 처리 성능 + 실시간 질의 테즈, 벡터화

0.11: 다양한 조인 구현• 고객 정보와 주문 정보를 합쳐서 각 고객이 어떤 주문을 했는지 모아서 보고 싶을 때. 각 주문이 고객 번호를 가리키는 방식.

Deep Dive content by Hortonworks, Inc. is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

Quick Refresher on Joins

Page 13

customer& order&first& last& id& cid& price& quan6ty&

Nick' Toner' 11911' 4150' 10.50' 3'

Jessie' Simonds' 11912' 11914' 12.25' 27'

Kasi' Lamers' 11913' 3491' 5.99' 5'

Rodger' Clayton' 11914' 2934' 39.99' 22'

Verona' Hollen' 11915' 11914' 40.50' 10'

SELECT&*&FROM&customer&join&order&ON#customer.id#=#order.cid;&

Joins'match'values'from'one'table'against'values'in'another'table.'

하이브의 조인 전략들방식 장점 단점

셔플 조인

맵/리듀스를 이용해서 키를 기준으로 재분배(셔플)하여 조인 측에서 조인을 수행.

어떤 형태의 데이터 크기와 구성에서도 작동함.

가장 자원을 많이 사용하며 가장 느린 조인 방식.

브로드캐스트 조인

작은 테이블이 모든 노드의 메모리에 올라가고, 매퍼는 큰 테이블을 훑어보며 조인을 함.

가장 큰 테이블에서 굉장히 빠른 단일 스캔.

작은 테이블이 메모리에 들어갈 정도로 작아야 함.

정렬-병합-버킷(SMB) 조

인

매퍼는 각 키가 인접한 특성을 이용해 효과적인 조인을 수행.

어떤 크기의 테이블에서도 굉장히 빠름.

사전에 자료가 정렬되고 버켓화 되어있어야 함.

셔플 조인


Shuffle Joins in Map Reduce

Page 15


Nick' Toner' 11911' 4150' 10.50' 3'

Jessie' Simonds' 11912' 11914' 12.25' 27'

Kasi' Lamers' 11913' 3491' 5.99' 5'

Rodger' Clayton' 11914' 2934' 39.99' 22'

Verona' Hollen' 11915' 11914' 40.50' 10'


M {'id:'11911,'{'first:'Nick,'last:'Toner'}}'

{'id:'11914,'{'first:'Rodger,'last:'Clayton'}}'…'

M {'cid:'4150,'{'price:'10.50,'quan1ty:'3'}}'

{'cid:'11914,'{'price:'12.25,'quan1ty:'27'}}'…'

R {'id:'11914,'{'first:'Rodger,'last:'Clayton'}}'{'cid:'11914,'{'price:'12.25,'quan1ty:'27'}}'

R {'id:'11911,'{'first:'Nick,'last:'Toner'}}'{'cid:'4150,'{'price:'10.50,'quan1ty:'3'}}'…'

Iden1cal'keys'shuffled'to'the'same'reducer.'Join'done'reduceEside.'Expensive'from'a'network'u1liza1on'standpoint.'

브로드캐스트 조인• 스타 스키마는 메모리에 올라올 정도로 작은 디멘전 테이블들을 사용 • 모든 노드에서 작은 테이블들이 메모리에 올라감 • 큰 테이블을 통한 단일 스캔 • 일반적인 DW에서 스타-스키마 방식의 조인에서 널리 쓰임


Broadcast Join

• Star schemas use dimension tables small enough to fit in RAM. • Small tables held in memory by all nodes. • Single pass through the large table. • Used for star-schema type joins common in DW.

Page 16

양 테이블이 메모리에비해 너무 클 때:


When both are too large for memory:

Page 17


Nick' Toner' 11911' 4150' 10.50' 3'

Jessie' Simonds' 11912' 11914' 12.25' 27'

Kasi' Lamers' 11913' 11914' 40.50' 10'

Rodger' Clayton' 11914' 12337' 39.99' 22'

Verona' Hollen' 11915' 15912' 40.50' 10'


CREATE&TABLE&customer&(id&int,&first&string,&last&string)&CLUSTERED#BY(id)#SORTED#BY(id)#INTO#32#BUCKETS;#

CREATE&TABLE&order&(cid&int,&price&float,&quantity&int)&CLUSTERED#BY(cid)#SORTED#BY(cid)#INTO#32#BUCKETS;#

Cluster&and&sort&by&the&most&common&join&key.&

하이브의 클러스터링과 정렬


Hive’s Clustering and Sorting

Page 18


Nick' Toner' 11911' 4150' 10.50' 3'

Jessie' Simonds' 11912' 11914' 12.25' 27'

Kasi' Lamers' 11913' 11914' 40.50' 10'

Rodger' Clayton' 11914' 12337' 39.99' 22'

Verona' Hollen' 11915' 15912' 40.50' 10'


Observa1on'1:'Sor1ng'by'the'join'key'makes'joins'easy.'

All'possible'matches'reside'in'the'same'area'on'disk.'

하이브의 클러스터링과 정렬


Hive’s Clustering and Sorting

Page 19


Nick' Toner' 11911' 4150' 10.50' 3'

Jessie' Simonds' 11912' 11914' 12.25' 27'

Kasi' Lamers' 11913' 11914' 40.50' 10'

Rodger' Clayton' 11914' 12337' 39.99' 22'

Verona' Hollen' 11915' 15912' 40.50' 10'


Observa1on'2:'

Hash'bucke1ng'a'join'key'ensures'all'matching'values'reside'on'the'same'node.'

EquiEjoins'can'then'run'with'no'shuffle.'

데이터 지역성 통제• 버켓팅 (Bucketing): • 파티션 값을 해시하여 미리 설정된 버켓에 저장 • 일반적으로 정렬과 함께 사용

• 스큐 (Skew): • 값을 분리된 파일들로 저장 • 특정한 값이 자주 등장할 때 사용

• 복제 계수 (Replication Factor): • 복제 계수를 증가시켜 읽기 성능을 향상 • HDFS 수준에서 설정

• 정렬 (Sorting): • 주어진 컬럼을 기준으로 값을 정렬 • ORC 파일 필터 다운과 사용했을 때 질의를 굉장히 가속

하이브 자료 설계 지침테이블 크기 데이터 특성 질의 패턴 추천 전략

작음 자주 쓰임 모두 복제 계수를 증가

모두 모두 특정한 필터 특정한 질의에서 가장 많이 쓰이는 컬럼을 기준으로 정렬

큼 모두 다른 큰 테이블에 조인됨 테이블을 조인 키에 따라 정렬하고 버켓화

큼 한 값이 25% 이상으로 자주 쓰임

모두 잦은 값을 별도의 스큐로 분리

큼 모두 질의가 날짜처럼 자연적으로 경계들을 갖는 경향이 있음

자료를 자연적인 경계에 맞추어 파티션

자동 조인 구현체 선택

작업 예: 스테이징 테이블 생성


Example Workflow: Create a Staging table

Page 42

CREATE EXTERNAL TABLE pos_staging (!!txnid STRING,!!txntime STRING,!!givenname STRING,!!lastname STRING,!!postalcode STRING,!!storeid STRING,!!ind1 STRING,!!productid STRING,!!purchaseamount FLOAT,!!creditcard STRING!

)ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'!LOCATION '/user/hdfs/staging_data/pos_staging';!

The raw data is the result of initial loading or the output of a mapreduce or pig job. We create an external table over the results of that job as we only intend to use it to load an optimized table.

작업 예: 파티션 스키마를 선택


Example Workflow: Choose a partition scheme

Page 43

hive> select distinct concat(year(txntime),month(txntime)) as part_dt !from pos_staging;!…!OK!20121!201210!201211!201212!20122!20123!20124!20125!20126!20127!20128!20129!Time taken: 21.823 seconds, Fetched: 12 row(s)!

Execute a query to determine if the partition choice returns a reasonable result. We will use this projection to create partitions for our data set. You want to keep your partitions large enough to be useful in partition pruning and efficient for HDFS storage. Hive has configurable bounds to ensure you do not exceed per node and total partition counts (defaults shown): hive.exec.max.dynamic.partitions=1000 hive.exec.max.dynamic.partitions.pernode=100

작업 예: 최적화된 테이블 정의


Example Workflow: Define optimized table

Page 44

CREATE TABLE fact_pos! (!

!txnid STRING,!!txntime STRING,!!givenname STRING,!!lastname STRING,!!postalcode STRING,!!storeid STRING,!!ind1 STRING,!!productid STRING,!!purchaseamount FLOAT,!!creditcard STRING!

) PARTITIONED BY (part_dt STRING)! CLUSTERED BY (txnid)! SORTED BY (txnid)! INTO 24 BUCKETS!STORED AS ORC tblproperties ("orc.compress"="SNAPPY");!

The part_dt field is defined in the partition by clause and cannot be the same name as any other fields. In this case, we will be performing a modification of txntime to generate a partition key. The cluster and sorted clauses contain the only key we intend to join the table on. We have stored as ORCFile with Snappy compression.

작업 예: 데이터를 최적화된테이블에 탑재


Example Workflow: Load Data Into Optimized Table

Page 45

set hive.enforce.sorting=true;!set hive.enforce.bucketing=true;!set hive.exec.dynamic.partition=true;!set hive.exec.dynamic.partition.mode=nonstrict; !set mapreduce.reduce.input.limit=-1;!!FROM pos_staging!INSERT OVERWRITE TABLE fact_pos!PARTITION (part_dt)!SELECT!

!txnid,!!txntime,!!givenname,!!lastname,!!postalcode,!!storeid,!!ind1,!!productid,!!purchaseamount,!!creditcard,!!concat(year(txntime),month(txntime)) as part_dt!

SORT BY productid;!!

We use this commend to load data from our staging table into our optimized ORCFile format. Note that we are using dynamic partitioning with the projection of the txntime field. This results in a MapReduce job that will copy the staging data into ORCFile format Hive managed table.

작업 예: 복제 계수를 증가


Example Workflow: Increase replication factor

Page 46

hadoop fs -setrep -R –w 5 /apps/hive/warehouse/fact_pos!

Increase the replication factor for the high performance table. This increases the chance for data locality. In this case, the increase in replication factor is not for additional resiliency. This is a trade-off of storage for performance. In fact, to conserve space, you may choose to reduce the replication factor for older data sets or even delete them altogether. With the raw data in place and untouched, you can always recreate the ORCFile high performance tables. Most users place the steps in this example workflow into an Oozie job to automate the work.

작업 예: 숏서킷 읽기를 활성화


Example Workflow: Enabling Short Circuit Read

Page 47

In hdfs-site.xml (or your custom Ambari settings for HDFS, restart service after):!!dfs.block.local-path-access.user=hdfs!dfs.client.read.shortcircuit=true!dfs.client.read.shortcircuit.skip.checksum=false!

Short Circuit reads allow the mappers to bypass the overhead of opening a port to the datanode if the data is local. The permissions for the local block files need to permit hdfs to read them (should be by default already) See HDFS-2246 for more details.

작업 예: 질의 수행


Example Workflow: Execute your query

Page 48

set hive.mapred.reduce.tasks.speculative.execution=false;!set io.sort.mb=300;!set mapreduce.reduce.input.limit=-1;!!select productid, ROUND(SUM(purchaseamount),2) as total !from fact_pos !where part_dt between ‘201210’ and ‘201212’!group by productid !order by total desc !limit 100;!!…!OK!20535!3026.87!39079!2959.69!28970!2869.87!45594!2821.15!…!15649!2242.05!47704!2241.22!8140 !2238.61!Time taken: 40.087 seconds, Fetched: 100 row(s)!

In the case above, we have a simple query executed to test out our table. We have some example parameters set before our query. The good news is that most of the parameters regarding join and engine optimizations are already set for you in Hive 0.11 (HDP). The io.sort.mb is presented as an example of one of the tunable parameters you may want to change for this particular SQL (note this value assumes 2-3GB JVMs for mappers). We are also partition pruning for the holiday shopping season, Oct to Dec.

작업 예: 앰버리에서실행 경로 체크


Example Workflow: Check Execution Path in Ambari

Page 49

You can check the execution path in Ambari’s Job viewer. This gives a high level overview of the stages and particular number of map and reduce tasks. With Tez, it also shows task number and execution order. The counts here are small as this is a sample from a single-node HDP Sandbox. For more detailed analysis, you will need to read the query plan via “explain”.

하이브 고속 질의 체크 목록•데이터를 자연스런 질의 경계에 의해 파티션함 (예: 날짜).

• 가장 자주 조인되는 데이터를 인접시켜 데이터 셔플을 최소화.

• 잦은 값은 스큐를 이용하여 분산

• 숏서킷 읽기 활성화.

• ORC 파일 사용.

• 자주 타겟되는 질의에서 로우 스킵을 사용하기 위해 컬럼 재정렬.

• 쿼리 계획을 확인하여 가장 큰 테이블을 기준으로 단일 스캔하는지 확인.

• 쿼리 계획을 확인하여 파티션 건너뛰기가 일어나는지 확인.

• 매 조인마다 최소한 하나의 ON 절 사용.

0.12: 자료 압축• “CREATE TABLE” 명령문 끝에 “STORED AS ORC” 구문만 추가하면 압축 적용.

• 평균 1/4으로 용량이 줄어들어 4배 더 많은 데이터를 보관 가능.

0.13: 실시간 질의• 테즈 사용

• 테즈 프로세스 상주: 프로세스를 새로 만드느라 낭비되는 시간을 없앰. 워밍업과 재사용 등 다양한 기법 도입.

• 맵리듀스 재구성: 예전 맵리듀스는 맵과 리듀스가 한 쌍으로 묶여, 다음 맵으로 전하기 위해서는 디스크에 써야했다. 이 제약을 풀어 불필요한 디스크 입출력을 없앴다.

성능 측정• https://github.com/cartershanklin/hive-testbench

• TPC DS 기반의 성능 측정 도구를 오픈소스로 제공. 일반 텍스트와 압축 및 최적화된 형태 두 가지를 생성. 임팔라 등 다른 시스템과 비교에도 사용 가능.

• 국내 대기업 고객사 시험 환경에서 적용하자, 호튼웍스 홈페이지에서 밝힌 결과와 초단위로 일치.

https://github.com/cartershanklin/hive-testbench

하이브 최적화 방안

Technology