c l oud mapreduce: a mapreduce implementation on top of a cloud operation system
DESCRIPTION
C l oud MapReduce: A MapReduce Implementation on top of a Cloud Operation System. Huan Liu, Dan Orban Accenture Technology Labs. 9962161 江嘉福 100062228 徐光成 100062229 章博遠. 2011, 11th IEEE/ACM International Symposium on. 1. OUTLINE. I. I ntroduction - PowerPoint PPT PresentationTRANSCRIPT
Cloud MapReduce:A MapReduce
Implementationon top of a
Cloud Operation System
9962161 江嘉福100062228 徐光
成100062229 章博
遠2011, 11th IEEE/ACM International Symposium on
Huan Liu, Dan OrbanAccenture Technology Labs
1
OUTLINE
I. IntroductionII. Cloud MapReduceArchitecture & ImplementationIII. Pros & Cons of Cloud MapReduceIV. Experimental EvaluationV. Conclusions & Future WorksVI. References
29962161 江嘉福 100062228 徐光成 100062229 章博遠
INTRODUCTION
1. What is Cloud OS ?
2. Challenges posed by a cloud OS
3. Cloud MapReduce?
4. Advantages of Cloud MapReduce
39962161 江嘉福 100062228 徐光成 100062229 章博遠
What is Cloud OS ?
1.Managing the low level cloud resources
2.Presenting a high level interface to
the application programmers3.key difference : scalable
圖一
49962161 江嘉福 100062228 徐光成 100062229 章博遠
Challenges posed by a cloud OS
1.Scalability comes at a price.
2.Data consistency, system availability, and tolerance to network partition.
圖二
59962161 江嘉福 100062228 徐光成 100062229 章博遠
Cloud MapReduce?
1.MapReduce programming model
2.horizontal scaling
3.eventual consistency
4.overcome limitations
69962161 江嘉福 100062228 徐光成 100062229 章博遠
Advantages of Cloud MapReduce1.Incremental scalability:
Can scale incrementally in the number of computing nodes.
2.Symmetry and Decentralization:Node has the same set of responsibilities.
3.Heterogeneity:Nodes have varying computation capacity.
79962161 江嘉福 100062228 徐光成 100062229 章博遠
Cloud MapReduceArchitecture and Implementation1.The architecture
2.Cloud challnenges
3.General solution approaches
89962161 江嘉福 100062228 徐光成 100062229 章博遠
The Architecture
99962161 江嘉福 100062228 徐光成 100062229 章博遠
Cloud challenges &General solution approaches
1.Long latency
2.Horizontal scaling
3.Don’t know when a queue is created for the first time
109962161 江嘉福 100062228 徐光成 100062229 章博遠
Con’t
4.Duplicate message
5.Potential node failure
6.Indeterminstic eventual consistency windows
119962161 江嘉福 100062228 徐光成 100062229 章博遠
Pros
●3000 lines of Java code(L.O.C) vs 285375 Hadoop L.O.C
●Large & Reliable FS
●High Bandwidth(fast read/write)
●Single point of contact(high throughput)
129962161 江嘉福 100062228 徐光成 100062229 章博遠
Cons
●Uses only network(no local storage)
●Leads to bottleneck
139962161 江嘉福 100062228 徐光成 100062229 章博遠
Evaluation
Almost twice as fast!
149962161 江嘉福 100062228 徐光成 100062229 章博遠
Evaluation
● Hadoop - 385s total, network/CPU under utilized● CMR - 210s, more efficient network/CPU usage
159962161 江嘉福 100062228 徐光成 100062229 章博遠
Evaluation
Wiki Word Count
●Combiner:Hadoop - 747sCMR - 436s
●No Combiner:Hadoop - 1733sCMR - 1247s
169962161 江嘉福 100062228 徐光成 100062229 章博遠
Evaluation
Amazon
●Word Count -> 400GB using 100 nodes●Approx. 1hr●983,152 Requests -> $0.98
●Using SimpleDB?●3.7hrs -> $0.52
179962161 江嘉福 100062228 徐光成 100062229 章博遠
Evaluation
Comparison
●Distributed Grep Word Count -> 13GB of data●CMR = 962 seconds●Hadoop 1047 seconds
●Results are almost the same, why?●More CPU intensive tasks
189962161 江嘉福 100062228 徐光成 100062229 章博遠
Evaluation
12GB - 923670 HTML files
●Hadoop -> 6hrs+
●CMR -> 297 seconds
●Hadoop - High overhead from task creation
199962161 江嘉福 100062228 徐光成 100062229 章博遠
Conclusion
●Cloud cannot be implemented on any system●Poor Performance
●CMR techniques overcome cloud limitations●0 Performance Degradation●Good to use for other systems
209962161 江嘉福 100062228 徐光成 100062229 章博遠
REFERENCES
圖一: http://techcrunch.com/
圖二: http://blog.csdn.net/zouqingfang/article/details/7269920
http://zh.wikipedia.org/
https://code.google.com/p/cloudmapreduce/
http://searchcloudcomputing.techtarget.com/definition/MapReduce
http://myblog-maurice.blogspot.tw/2012/08/nosqlcap.html
219962161 江嘉福 100062228 徐光成 100062229 章博遠