profiling, what-if analysis and cost- based optimization of mapreduce programs oct 7 th 2013...

32
Profiling, What-if Analysis and Cost- based Optimization of MapReduce Programs Oct 7 th 2013 Database Lab. Wonseok Choi

Upload: adelia-alexander

Post on 11-Jan-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Profiling, What-if Analysis and Cost- based Optimization of MapReduce Programs Oct 7 th 2013 Database Lab. Wonseok Choi

Profiling, What-if Analysis and Cost-based Optimization of MapReduce Programs

Oct 7th 2013Database Lab.Wonseok Choi

Page 2: Profiling, What-if Analysis and Cost- based Optimization of MapReduce Programs Oct 7 th 2013 Database Lab. Wonseok Choi

2

발표 전날

Page 3: Profiling, What-if Analysis and Cost- based Optimization of MapReduce Programs Oct 7 th 2013 Database Lab. Wonseok Choi

3

이번에 발표 못하면끝이야 !!!!

학점 받기는불가능해 !!!!+ 졸업시험 !!

Page 4: Profiling, What-if Analysis and Cost- based Optimization of MapReduce Programs Oct 7 th 2013 Database Lab. Wonseok Choi

4

시간안에 죽지않고 발표

준비를 마칠 수 있을까

Page 5: Profiling, What-if Analysis and Cost- based Optimization of MapReduce Programs Oct 7 th 2013 Database Lab. Wonseok Choi

5

목차

1. Introduction2. Profiler3. What-if engine4. Cost-based optimizer5. Experimental evaluation6. Conclusion

Page 6: Profiling, What-if Analysis and Cost- based Optimization of MapReduce Programs Oct 7 th 2013 Database Lab. Wonseok Choi

6

Introduction

MapReduce has emerged as a viable competitor to database systems in big data analytics.

Profiler, What-if Engine, Cost-based Optimizer Profiler : collect detailed statistical information from

unmodified MapReduce programs. What-if Engine : fine-grained costestimation. Cost-based Optimizer : optimize configuration parameter

setting.

Page 7: Profiling, What-if Analysis and Cost- based Optimization of MapReduce Programs Oct 7 th 2013 Database Lab. Wonseok Choi

7

Introduction

MapReduce job J J = <p, d, r, c> p: MapReduce program d: map(k1, v1) 과 reduce(k2, list(v2)) 두 함수를 통해

입력되는 data r: Cluster resources c: Configuration parameter settings

Page 8: Profiling, What-if Analysis and Cost- based Optimization of MapReduce Programs Oct 7 th 2013 Database Lab. Wonseok Choi

8

Introduction

Configuration parameter settings include.. The number of map tasks The number of reduce tasks The amount of memory The settings for multiphase external sorting Whether the output data from the map (reduce) tasks

should be compressed before being written to disk Whether a program-specified Combiner function should

be used to preaggregate map outputs before their transfer to reduce tasks.

Page 9: Profiling, What-if Analysis and Cost- based Optimization of MapReduce Programs Oct 7 th 2013 Database Lab. Wonseok Choi

9

Introduction

Page 10: Profiling, What-if Analysis and Cost- based Optimization of MapReduce Programs Oct 7 th 2013 Database Lab. Wonseok Choi

10

Introduction

Page 11: Profiling, What-if Analysis and Cost- based Optimization of MapReduce Programs Oct 7 th 2013 Database Lab. Wonseok Choi

11

Introduction

Costbased Optimization to Select Configuration Parameter Settings Automatically perf = F(p, d, r, c) perf is some performance metric of interest for jobs Optimizing the performance of program p for given input

data d and cluster resources r requires finding configuration parameter settings that give near-optimal values of perf.

Page 12: Profiling, What-if Analysis and Cost- based Optimization of MapReduce Programs Oct 7 th 2013 Database Lab. Wonseok Choi

12

Introduction

MapReduce program optimization poses new challenges compared to conventional database query optimization Black-box map and reduce functions Lack of schema and statistics about the input data Differences in plan spaces

Cost-based Optimizer Profiler What-if Engine Cost-based Optimizer

Page 13: Profiling, What-if Analysis and Cost- based Optimization of MapReduce Programs Oct 7 th 2013 Database Lab. Wonseok Choi

13

Profiler

Phase of Map Task Execution Read, Map, Collect, Spill, Merge

Phase of Reduce Task Execution Shuffle, Merge, Reduce, Write

Page 14: Profiling, What-if Analysis and Cost- based Optimization of MapReduce Programs Oct 7 th 2013 Database Lab. Wonseok Choi

14

Profiler

Job Profiler A MapReduce job profile is a vector in which each field

captures some unique aspect of dataflow or cost during job execution at the task level or the phase level within tasks.

Data flow fields Cost fields Dataflow Statistics fields Cost Statistics fields

Page 15: Profiling, What-if Analysis and Cost- based Optimization of MapReduce Programs Oct 7 th 2013 Database Lab. Wonseok Choi

15

Profiler

Using Profiles to Analyze Job Behavior

Page 16: Profiling, What-if Analysis and Cost- based Optimization of MapReduce Programs Oct 7 th 2013 Database Lab. Wonseok Choi

16

Profiler

Generating Profiles via Measurement Job profiles are generated in two distinct ways.(Profiler,

What-if Engine) Monitoring through dynamic instrumentation From raw monitoring data to profile fields Task-level sampling to generate approximate profiles

Page 17: Profiling, What-if Analysis and Cost- based Optimization of MapReduce Programs Oct 7 th 2013 Database Lab. Wonseok Choi

17

What-if Engine

A what-if question has the following form Given the profile of a job j = hp; d1; r1; c1i that runs a

MapReduce program p over input data d1 and cluster resources r1 using configuration c1, what will the performance of program p be if p is run over input data d2 and cluster resources r2 using configuration c2? That is, how will job j0 = hp; d2; r2; c2i perform?

The What-if Engine executes the following two steps to answer a what-if question Estimating a virtual job profile for the hypothetical job j’. Using the virtual profile to simulate how j’ will execute.

We will discuss these steps in turn.

Page 18: Profiling, What-if Analysis and Cost- based Optimization of MapReduce Programs Oct 7 th 2013 Database Lab. Wonseok Choi

18

What-if Engine

Estimating the Virtual Profile Estimating Dataflow and Cost fields Estimating Dataflow Statistics fields Estimating Cost Statistics fields

Page 19: Profiling, What-if Analysis and Cost- based Optimization of MapReduce Programs Oct 7 th 2013 Database Lab. Wonseok Choi

19

What-if Engine

Estimating Dataflow and Cost fields detailed set of analytical (white-box) modelsfor estimating the Dataflow and Cost fields in the virtual job profile for j'.

Estimating Dataflow Statistics fields Dataflow proportionality assumption

Estimating Cost Statistics fields Cluster node homogeneity assumption

Simulating the Job Execution Task Scheduler Simulator

Page 20: Profiling, What-if Analysis and Cost- based Optimization of MapReduce Programs Oct 7 th 2013 Database Lab. Wonseok Choi

20

Cost-based Optimizer (CBO)

MapReduce program optimization can be defined as Given a MapReduce program p to be run on input data d and

cluster resources r, find the setting of configuration parameters

for the cost model F represented by the What-if

Engine over the full space S of configuration parameter settings.

The CBO addresses this problem by making what-if calls with settings c of the configuration parameters selected through an enumeration and search over S.

Once a job profile to input to the What-if Engine is available, the CBO uses a two-step process, discussed next.

Page 21: Profiling, What-if Analysis and Cost- based Optimization of MapReduce Programs Oct 7 th 2013 Database Lab. Wonseok Choi

21

Cost-based Optimizer (CBO)

Subspace Enumeration A straightforward approach the CBO can take is to apply

enumeration and search techniques to the full space of parameter settings S.

More efficient search techniques can be developed if the individual parameters in c can be grouped into clusters.

Equation 2 states that the globally-optimal setting copt can be found using a divide and conquer approach by :

breaking the higher-dimensional space S into the lower-dimensional subspaces S(i)

considering an independent optimization problem in each smaller subspace

composing the optimal parameter settings found per subspace to give the setting copt

Page 22: Profiling, What-if Analysis and Cost- based Optimization of MapReduce Programs Oct 7 th 2013 Database Lab. Wonseok Choi

22

Cost-based Optimizer (CBO)

Search Strategy within a Subspace searching within each enumerated subspace to find the

optimal configuration in the subspace. Gridding (Equispaced or Random) Recursive Random Search (RRS)

RRS provides probabilistic guarantees on how close the setting it finds is to the optimal setting

RRS is fairly robust to deviations of estimated costs from actual performance

RRS scales to a large number of dimensions

Page 23: Profiling, What-if Analysis and Cost- based Optimization of MapReduce Programs Oct 7 th 2013 Database Lab. Wonseok Choi

23

Cost-based Optimizer (CBO)

there are two choices for subspace enumeration: Full or Clustered that deal respectively with the full space or smaller subspaces for map and reduce tasks

three choices for search within a subspace: Gridding (Equispaced or Random) and RRS.

Page 24: Profiling, What-if Analysis and Cost- based Optimization of MapReduce Programs Oct 7 th 2013 Database Lab. Wonseok Choi

24

Experimental Evaluation

Page 25: Profiling, What-if Analysis and Cost- based Optimization of MapReduce Programs Oct 7 th 2013 Database Lab. Wonseok Choi

25

Experimental Evaluation

Page 26: Profiling, What-if Analysis and Cost- based Optimization of MapReduce Programs Oct 7 th 2013 Database Lab. Wonseok Choi

26

Experimental Evaluation

Page 27: Profiling, What-if Analysis and Cost- based Optimization of MapReduce Programs Oct 7 th 2013 Database Lab. Wonseok Choi

27

Experimental Evaluation

Page 28: Profiling, What-if Analysis and Cost- based Optimization of MapReduce Programs Oct 7 th 2013 Database Lab. Wonseok Choi

28

Experimental Evaluation

Page 29: Profiling, What-if Analysis and Cost- based Optimization of MapReduce Programs Oct 7 th 2013 Database Lab. Wonseok Choi

29

Experimental Evaluation

Page 30: Profiling, What-if Analysis and Cost- based Optimization of MapReduce Programs Oct 7 th 2013 Database Lab. Wonseok Choi

30

Discussion and Future work

Costbased Optimizer for simple to arbitrarily complex MapReduce programs.

Several new research challenges arise when we consider the full space of optimization opportunities provided by these higher-level systems.

proposed a lightweight Profiler to collect detailed statistical information from unmodified MapReduce programs.

proposed a What-if Engine for the fine-grained cost estimation needed by the Cost-based Optimizer.

Page 31: Profiling, What-if Analysis and Cost- based Optimization of MapReduce Programs Oct 7 th 2013 Database Lab. Wonseok Choi

Q & A

31

Page 32: Profiling, What-if Analysis and Cost- based Optimization of MapReduce Programs Oct 7 th 2013 Database Lab. Wonseok Choi

32

좋아 ! 이정도면 선방했…