c l oud mapreduce: a mapreduce implementation on top of a cloud operation system

21
Cloud MapReduce: A MapReduce Implementation on top of a Cloud Operation System 9962161 江江江 100062228 江江江 100062229 江江江 2011, 11th IEEE/ACM International Symposium on Huan Liu, Dan Orban Accenture Technology Labs 1

Upload: jacie

Post on 23-Jan-2016

83 views

Category:

Documents


0 download

DESCRIPTION

C l oud MapReduce: A MapReduce Implementation on top of a Cloud Operation System. Huan Liu, Dan Orban Accenture Technology Labs. 9962161 江嘉福 100062228 徐光成 100062229 章博遠. 2011, 11th IEEE/ACM International Symposium on. 1. OUTLINE. I. I ntroduction - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: C l oud MapReduce: A MapReduce Implementation on top of a Cloud Operation System

Cloud MapReduce:A MapReduce

Implementationon top of a

Cloud Operation System

9962161 江嘉福100062228 徐光

成100062229 章博

遠2011, 11th IEEE/ACM International Symposium on

Huan Liu, Dan OrbanAccenture Technology Labs

1

Page 2: C l oud MapReduce: A MapReduce Implementation on top of a Cloud Operation System

OUTLINE

I. IntroductionII. Cloud MapReduceArchitecture & ImplementationIII. Pros & Cons of Cloud MapReduceIV. Experimental EvaluationV. Conclusions & Future WorksVI. References

29962161 江嘉福 100062228 徐光成 100062229 章博遠

Page 3: C l oud MapReduce: A MapReduce Implementation on top of a Cloud Operation System

INTRODUCTION

1. What is Cloud OS ?

2. Challenges posed by a cloud OS

3. Cloud MapReduce?

4. Advantages of Cloud MapReduce

39962161 江嘉福 100062228 徐光成 100062229 章博遠

Page 4: C l oud MapReduce: A MapReduce Implementation on top of a Cloud Operation System

What is Cloud OS ?

1.Managing the low level cloud resources

2.Presenting a high level interface to

the application programmers3.key difference : scalable

圖一

49962161 江嘉福 100062228 徐光成 100062229 章博遠

Page 5: C l oud MapReduce: A MapReduce Implementation on top of a Cloud Operation System

Challenges posed by a cloud OS

1.Scalability comes at a price.

2.Data consistency, system availability, and tolerance to network partition.

圖二

59962161 江嘉福 100062228 徐光成 100062229 章博遠

Page 6: C l oud MapReduce: A MapReduce Implementation on top of a Cloud Operation System

Cloud MapReduce?

1.MapReduce programming model

2.horizontal scaling

3.eventual consistency

4.overcome limitations

69962161 江嘉福 100062228 徐光成 100062229 章博遠

Page 7: C l oud MapReduce: A MapReduce Implementation on top of a Cloud Operation System

Advantages of Cloud MapReduce1.Incremental scalability:

Can scale incrementally in the number of computing nodes.

2.Symmetry and Decentralization:Node has the same set of responsibilities.

3.Heterogeneity:Nodes have varying computation capacity.

79962161 江嘉福 100062228 徐光成 100062229 章博遠

Page 8: C l oud MapReduce: A MapReduce Implementation on top of a Cloud Operation System

Cloud MapReduceArchitecture and Implementation1.The architecture

2.Cloud challnenges

3.General solution approaches

89962161 江嘉福 100062228 徐光成 100062229 章博遠

Page 9: C l oud MapReduce: A MapReduce Implementation on top of a Cloud Operation System

The Architecture

99962161 江嘉福 100062228 徐光成 100062229 章博遠

Page 10: C l oud MapReduce: A MapReduce Implementation on top of a Cloud Operation System

Cloud challenges &General solution approaches

1.Long latency

2.Horizontal scaling

3.Don’t know when a queue is created for the first time

109962161 江嘉福 100062228 徐光成 100062229 章博遠

Page 11: C l oud MapReduce: A MapReduce Implementation on top of a Cloud Operation System

Con’t

4.Duplicate message

5.Potential node failure

6.Indeterminstic eventual consistency windows

119962161 江嘉福 100062228 徐光成 100062229 章博遠

Page 12: C l oud MapReduce: A MapReduce Implementation on top of a Cloud Operation System

Pros

●3000 lines of Java code(L.O.C) vs 285375 Hadoop L.O.C

●Large & Reliable FS

●High Bandwidth(fast read/write)

●Single point of contact(high throughput)

129962161 江嘉福 100062228 徐光成 100062229 章博遠

Page 13: C l oud MapReduce: A MapReduce Implementation on top of a Cloud Operation System

Cons

●Uses only network(no local storage)

●Leads to bottleneck

139962161 江嘉福 100062228 徐光成 100062229 章博遠

Page 14: C l oud MapReduce: A MapReduce Implementation on top of a Cloud Operation System

Evaluation

Almost twice as fast!

149962161 江嘉福 100062228 徐光成 100062229 章博遠

Page 15: C l oud MapReduce: A MapReduce Implementation on top of a Cloud Operation System

Evaluation

● Hadoop - 385s total, network/CPU under utilized● CMR - 210s, more efficient network/CPU usage

159962161 江嘉福 100062228 徐光成 100062229 章博遠

Page 16: C l oud MapReduce: A MapReduce Implementation on top of a Cloud Operation System

Evaluation

Wiki Word Count

●Combiner:Hadoop - 747sCMR - 436s

●No Combiner:Hadoop - 1733sCMR - 1247s

169962161 江嘉福 100062228 徐光成 100062229 章博遠

Page 17: C l oud MapReduce: A MapReduce Implementation on top of a Cloud Operation System

Evaluation

Amazon

●Word Count -> 400GB using 100 nodes●Approx. 1hr●983,152 Requests -> $0.98

●Using SimpleDB?●3.7hrs -> $0.52

179962161 江嘉福 100062228 徐光成 100062229 章博遠

Page 18: C l oud MapReduce: A MapReduce Implementation on top of a Cloud Operation System

Evaluation

Comparison

●Distributed Grep Word Count -> 13GB of data●CMR = 962 seconds●Hadoop 1047 seconds

●Results are almost the same, why?●More CPU intensive tasks

189962161 江嘉福 100062228 徐光成 100062229 章博遠

Page 19: C l oud MapReduce: A MapReduce Implementation on top of a Cloud Operation System

Evaluation

12GB - 923670 HTML files

●Hadoop -> 6hrs+

●CMR -> 297 seconds

●Hadoop - High overhead from task creation

199962161 江嘉福 100062228 徐光成 100062229 章博遠

Page 20: C l oud MapReduce: A MapReduce Implementation on top of a Cloud Operation System

Conclusion

●Cloud cannot be implemented on any system●Poor Performance

●CMR techniques overcome cloud limitations●0 Performance Degradation●Good to use for other systems

209962161 江嘉福 100062228 徐光成 100062229 章博遠

Page 21: C l oud MapReduce: A MapReduce Implementation on top of a Cloud Operation System

REFERENCES

圖一: http://techcrunch.com/

圖二: http://blog.csdn.net/zouqingfang/article/details/7269920

http://zh.wikipedia.org/

https://code.google.com/p/cloudmapreduce/

http://searchcloudcomputing.techtarget.com/definition/MapReduce

http://myblog-maurice.blogspot.tw/2012/08/nosqlcap.html

219962161 江嘉福 100062228 徐光成 100062229 章博遠