cloud mapreduce: a mapreduce implementation on top of a cloud operation system 9962161 江嘉福...

21
Cloud MapReduce: A MapReduce Implementation on top of a Cloud Operation System 9962161 江江江 100062228 江江江 100062229 江江江 2011, 11th IEEE/ACM International Symposium on Huan Liu, Dan Orban Accenture Technology Labs 1

Upload: edward-reynolds

Post on 29-Dec-2015

223 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Cloud MapReduce: A MapReduce Implementation on top of a Cloud Operation System 9962161 江嘉福 100062228 徐光成 100062229 章博遠 2011, 11th IEEE/ACM International

Cloud MapReduce:A MapReduce

Implementationon top of a

Cloud Operation System

9962161 江嘉福100062228 徐光

成100062229 章博

遠2011, 11th IEEE/ACM International Symposium on

Huan Liu, Dan OrbanAccenture Technology Labs

1

Page 2: Cloud MapReduce: A MapReduce Implementation on top of a Cloud Operation System 9962161 江嘉福 100062228 徐光成 100062229 章博遠 2011, 11th IEEE/ACM International

OUTLINE

I. IntroductionII. Cloud MapReduceArchitecture & ImplementationIII. Pros & Cons of Cloud MapReduceIV. Experimental EvaluationV. Conclusions & Future WorksVI. References

29962161 江嘉福 100062228 徐光成 100062229 章博遠

Page 3: Cloud MapReduce: A MapReduce Implementation on top of a Cloud Operation System 9962161 江嘉福 100062228 徐光成 100062229 章博遠 2011, 11th IEEE/ACM International

INTRODUCTION

1. What is Cloud OS ?

2. Challenges posed by a cloud OS

3. Cloud MapReduce?

4. Advantages of Cloud MapReduce

39962161 江嘉福 100062228 徐光成 100062229 章博遠

Page 4: Cloud MapReduce: A MapReduce Implementation on top of a Cloud Operation System 9962161 江嘉福 100062228 徐光成 100062229 章博遠 2011, 11th IEEE/ACM International

What is Cloud OS ?

1.Managing the low level cloud resources

2.Presenting a high level interface to

the application programmers3.key difference : scalable

圖一

49962161 江嘉福 100062228 徐光成 100062229 章博遠

Page 5: Cloud MapReduce: A MapReduce Implementation on top of a Cloud Operation System 9962161 江嘉福 100062228 徐光成 100062229 章博遠 2011, 11th IEEE/ACM International

Challenges posed by a cloud OS

1.Scalability comes at a price.

2.Data consistency, system availability, and tolerance to network partition.

圖二

59962161 江嘉福 100062228 徐光成 100062229 章博遠

Page 6: Cloud MapReduce: A MapReduce Implementation on top of a Cloud Operation System 9962161 江嘉福 100062228 徐光成 100062229 章博遠 2011, 11th IEEE/ACM International

Cloud MapReduce?

1.MapReduce programming model

2.horizontal scaling

3.eventual consistency

4.overcome limitations

69962161 江嘉福 100062228 徐光成 100062229 章博遠

Page 7: Cloud MapReduce: A MapReduce Implementation on top of a Cloud Operation System 9962161 江嘉福 100062228 徐光成 100062229 章博遠 2011, 11th IEEE/ACM International

Advantages of Cloud MapReduce1.Incremental scalability:

Can scale incrementally in the number of computing nodes.

2.Symmetry and Decentralization:Node has the same set of responsibilities.

3.Heterogeneity:Nodes have varying computation capacity.

79962161 江嘉福 100062228 徐光成 100062229 章博遠

Page 8: Cloud MapReduce: A MapReduce Implementation on top of a Cloud Operation System 9962161 江嘉福 100062228 徐光成 100062229 章博遠 2011, 11th IEEE/ACM International

Cloud MapReduceArchitecture and Implementation1.The architecture

2.Cloud challnenges

3.General solution approaches

89962161 江嘉福 100062228 徐光成 100062229 章博遠

Page 9: Cloud MapReduce: A MapReduce Implementation on top of a Cloud Operation System 9962161 江嘉福 100062228 徐光成 100062229 章博遠 2011, 11th IEEE/ACM International

The Architecture

99962161 江嘉福 100062228 徐光成 100062229 章博遠

Page 10: Cloud MapReduce: A MapReduce Implementation on top of a Cloud Operation System 9962161 江嘉福 100062228 徐光成 100062229 章博遠 2011, 11th IEEE/ACM International

Cloud challenges &General solution approaches

1.Long latency

2.Horizontal scaling

3.Don’t know when a queue is created for the first time

109962161 江嘉福 100062228 徐光成 100062229 章博遠

Page 11: Cloud MapReduce: A MapReduce Implementation on top of a Cloud Operation System 9962161 江嘉福 100062228 徐光成 100062229 章博遠 2011, 11th IEEE/ACM International

Con’t

4.Duplicate message

5.Potential node failure

6.Indeterminstic eventual consistency windows

119962161 江嘉福 100062228 徐光成 100062229 章博遠

Page 12: Cloud MapReduce: A MapReduce Implementation on top of a Cloud Operation System 9962161 江嘉福 100062228 徐光成 100062229 章博遠 2011, 11th IEEE/ACM International

Pros

●3000 lines of Java code(L.O.C) vs 285375 Hadoop L.O.C

●Large & Reliable FS

●High Bandwidth(fast read/write)

●Single point of contact(high throughput)

129962161 江嘉福 100062228 徐光成 100062229 章博遠

Page 13: Cloud MapReduce: A MapReduce Implementation on top of a Cloud Operation System 9962161 江嘉福 100062228 徐光成 100062229 章博遠 2011, 11th IEEE/ACM International

Cons

●Uses only network(no local storage)

●Leads to bottleneck

139962161 江嘉福 100062228 徐光成 100062229 章博遠

Page 14: Cloud MapReduce: A MapReduce Implementation on top of a Cloud Operation System 9962161 江嘉福 100062228 徐光成 100062229 章博遠 2011, 11th IEEE/ACM International

Evaluation

Almost twice as fast!

149962161 江嘉福 100062228 徐光成 100062229 章博遠

Page 15: Cloud MapReduce: A MapReduce Implementation on top of a Cloud Operation System 9962161 江嘉福 100062228 徐光成 100062229 章博遠 2011, 11th IEEE/ACM International

Evaluation

● Hadoop - 385s total, network/CPU under utilized● CMR - 210s, more efficient network/CPU usage

159962161 江嘉福 100062228 徐光成 100062229 章博遠

Page 16: Cloud MapReduce: A MapReduce Implementation on top of a Cloud Operation System 9962161 江嘉福 100062228 徐光成 100062229 章博遠 2011, 11th IEEE/ACM International

Evaluation

Wiki Word Count

●Combiner:Hadoop - 747sCMR - 436s

●No Combiner:Hadoop - 1733sCMR - 1247s

169962161 江嘉福 100062228 徐光成 100062229 章博遠

Page 17: Cloud MapReduce: A MapReduce Implementation on top of a Cloud Operation System 9962161 江嘉福 100062228 徐光成 100062229 章博遠 2011, 11th IEEE/ACM International

Evaluation

Amazon

●Word Count -> 400GB using 100 nodes●Approx. 1hr●983,152 Requests -> $0.98

●Using SimpleDB?●3.7hrs -> $0.52

179962161 江嘉福 100062228 徐光成 100062229 章博遠

Page 18: Cloud MapReduce: A MapReduce Implementation on top of a Cloud Operation System 9962161 江嘉福 100062228 徐光成 100062229 章博遠 2011, 11th IEEE/ACM International

Evaluation

Comparison

●Distributed Grep Word Count -> 13GB of data●CMR = 962 seconds●Hadoop 1047 seconds

●Results are almost the same, why?●More CPU intensive tasks

189962161 江嘉福 100062228 徐光成 100062229 章博遠

Page 19: Cloud MapReduce: A MapReduce Implementation on top of a Cloud Operation System 9962161 江嘉福 100062228 徐光成 100062229 章博遠 2011, 11th IEEE/ACM International

Evaluation

12GB - 923670 HTML files

●Hadoop -> 6hrs+

●CMR -> 297 seconds

●Hadoop - High overhead from task creation

199962161 江嘉福 100062228 徐光成 100062229 章博遠

Page 20: Cloud MapReduce: A MapReduce Implementation on top of a Cloud Operation System 9962161 江嘉福 100062228 徐光成 100062229 章博遠 2011, 11th IEEE/ACM International

Conclusion

●Cloud cannot be implemented on any system●Poor Performance

●CMR techniques overcome cloud limitations●0 Performance Degradation●Good to use for other systems

209962161 江嘉福 100062228 徐光成 100062229 章博遠

Page 21: Cloud MapReduce: A MapReduce Implementation on top of a Cloud Operation System 9962161 江嘉福 100062228 徐光成 100062229 章博遠 2011, 11th IEEE/ACM International

REFERENCES

圖一: http://techcrunch.com/

圖二: http://blog.csdn.net/zouqingfang/article/details/7269920

http://zh.wikipedia.org/

https://code.google.com/p/cloudmapreduce/

http://searchcloudcomputing.techtarget.com/definition/MapReduce

http://myblog-maurice.blogspot.tw/2012/08/nosqlcap.html

219962161 江嘉福 100062228 徐光成 100062229 章博遠