+ 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀. + outline introduction...

30
+ 100062108 李李李100062116 李李李100062220 李李李

Upload: francis-holland

Post on 11-Jan-2016

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: + 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀. + Outline Introduction Architecture of Hadoop HDFS MapReduce Comparison Why Hadoop Conclusion 2 100062108

+

100062108 李智宇、100062116 林威宏、100062220 施閔耀

Page 2: + 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀. + Outline Introduction Architecture of Hadoop HDFS MapReduce Comparison Why Hadoop Conclusion 2 100062108

+OutlineIntroduction

Architecture of Hadoop

HDFS

MapReduce

Comparison

Why Hadoop

Conclusion

2

100062108 李智宇、 100062116 林威宏、 100062220 施閔耀

Page 3: + 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀. + Outline Introduction Architecture of Hadoop HDFS MapReduce Comparison Why Hadoop Conclusion 2 100062108

+What is Hadoop ? open-source software framework

process and store big data

Easy to use and implement, economic, flexible

lots of nodes(server)

written in JAVA

free license

created by Doug Cutting and Mike Cafarella in 2005

3

100062108 李智宇、 100062116 林威宏、 100062220 施閔耀

Page 4: + 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀. + Outline Introduction Architecture of Hadoop HDFS MapReduce Comparison Why Hadoop Conclusion 2 100062108

+Advantages of Interpreted Language

Cross-platform(ex: Windows, Ubuntu, Mac OS X)

smaller executable program size

easier to modify during both development and execution

100062108 李智宇、 100062116 林威宏、 100062220 施閔耀

4

Page 5: + 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀. + Outline Introduction Architecture of Hadoop HDFS MapReduce Comparison Why Hadoop Conclusion 2 100062108

+Architecture of Hadoop

100062108 李智宇、 100062116 林威宏、 100062220 施閔耀

5

Page 6: + 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀. + Outline Introduction Architecture of Hadoop HDFS MapReduce Comparison Why Hadoop Conclusion 2 100062108

+Hadoop in Enterprise

6

100062108 李智宇、 100062116 林威宏、 100062220 施閔耀

The Dell representation of the Hadoop ecosystem.

Page 7: + 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀. + Outline Introduction Architecture of Hadoop HDFS MapReduce Comparison Why Hadoop Conclusion 2 100062108

+Hadoop in Enterprise

7

100062108 李智宇、 100062116 林威宏、 100062220 施閔耀

Page 8: + 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀. + Outline Introduction Architecture of Hadoop HDFS MapReduce Comparison Why Hadoop Conclusion 2 100062108

+Who is using Hadoop ?

more than half of the Fortune 50 uses Hadoop by 2013

8

100062108 李智宇、 100062116 林威宏、 100062220 施閔耀

Page 9: + 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀. + Outline Introduction Architecture of Hadoop HDFS MapReduce Comparison Why Hadoop Conclusion 2 100062108

+HDFSHadoop Distributed File System

Client: user

name node: manage and store metadata, namespace of files

Data node: store files

each data node sends its status to name node periodically

100062108 李智宇、 100062116 林威宏、 100062220 施閔耀

9

Page 10: + 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀. + Outline Introduction Architecture of Hadoop HDFS MapReduce Comparison Why Hadoop Conclusion 2 100062108

+HDFS: Writing data in HDFS Each file will be divided into blocks(in size 64

or 128MB) , and have three copies in different data nodes.

Client asks name node to get a list of data node sorted by distance, and send the file to the nearest one , then the data node will send the file to the rest node.

When above operation done, data node will send “done” to name node.

100062108 李智宇、 100062116 林威宏、 100062220 施閔耀

10

Page 11: + 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀. + Outline Introduction Architecture of Hadoop HDFS MapReduce Comparison Why Hadoop Conclusion 2 100062108

+HDFS: Reading data in HDFSClient send filename to the name

node , then the name node will send a list of the blocks of files sorted by distance.

Client use the list to get the file from data node.

100062108 李智宇、 100062116 林威宏、 100062220 施閔耀

11

Page 12: + 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀. + Outline Introduction Architecture of Hadoop HDFS MapReduce Comparison Why Hadoop Conclusion 2 100062108

+HDFS: failurenode failure

communication failure

data corruption

100062108 李智宇、 100062116 林威宏、 100062220 施閔耀

12

Page 13: + 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀. + Outline Introduction Architecture of Hadoop HDFS MapReduce Comparison Why Hadoop Conclusion 2 100062108

+HDFS: handle failureHandle writing failure:

name node will skip the data node without an ACK.

Handle reading failure:recall that when reading a file, client will get a list of data node content the file.

100062108 李智宇、 100062116 林威宏、 100062220 施閔耀

13

Page 14: + 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀. + Outline Introduction Architecture of Hadoop HDFS MapReduce Comparison Why Hadoop Conclusion 2 100062108

+HDFS: handle failureName node handle node failure :

name node will find out the data the failure node have, and copy those data from others and restore them to other data node.

Note that HDFS can’t guarantee at least one copy of data is alive.

100062108 李智宇、 100062116 林威宏、 100062220 施閔耀

14

Page 15: + 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀. + Outline Introduction Architecture of Hadoop HDFS MapReduce Comparison Why Hadoop Conclusion 2 100062108

+MapReducesimilar to divide-and-conquer

First, use “Map” to divide tasks

Second, use “Shuffle” to “transfer the data from the mapper nodes to a reducer’s node and decompress if needed. “

Third, use “Reduce” to “execute the user-defined reduce function to produce the final output data. “

100062108 李智宇、 100062116 林威宏、 100062220 施閔耀

15

Page 16: + 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀. + Outline Introduction Architecture of Hadoop HDFS MapReduce Comparison Why Hadoop Conclusion 2 100062108

+MapReduce-Map

100062108 李智宇、 100062116 林威宏、 100062220 施閔耀

16

Page 17: + 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀. + Outline Introduction Architecture of Hadoop HDFS MapReduce Comparison Why Hadoop Conclusion 2 100062108

+MapReduce-shuffle

100062108 李智宇、 100062116 林威宏、 100062220 施閔耀

17

Page 18: + 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀. + Outline Introduction Architecture of Hadoop HDFS MapReduce Comparison Why Hadoop Conclusion 2 100062108

+MapReduce-Reduce

100062108 李智宇、 100062116 林威宏、 100062220 施閔耀

18

Page 19: + 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀. + Outline Introduction Architecture of Hadoop HDFS MapReduce Comparison Why Hadoop Conclusion 2 100062108

+MapReduce

100062108 李智宇、 100062116 林威宏、 100062220 施閔耀

19

Page 20: + 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀. + Outline Introduction Architecture of Hadoop HDFS MapReduce Comparison Why Hadoop Conclusion 2 100062108

+Comparison

100062108 李智宇、 100062116 林威宏、 100062220 施閔耀

20

Page 21: + 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀. + Outline Introduction Architecture of Hadoop HDFS MapReduce Comparison Why Hadoop Conclusion 2 100062108

+Comparison

100062108 李智宇、 100062116 林威宏、 100062220 施閔耀

21

Page 22: + 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀. + Outline Introduction Architecture of Hadoop HDFS MapReduce Comparison Why Hadoop Conclusion 2 100062108

+Why Hadoop?technically

100062108 李智宇、 100062116 林威宏、 100062220 施閔耀

22

Comparison of Grep Task Result with Vertica and DBMS-X

Page 23: + 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀. + Outline Introduction Architecture of Hadoop HDFS MapReduce Comparison Why Hadoop Conclusion 2 100062108

+Why Hadoop?

Simple structure vs. Optimization

Transaction time not minimized

Lower performance with same number of nodes

No compelling reason to choose Hadoop

technically

100062108 李智宇、 100062116 林威宏、 100062220 施閔耀

23

Page 24: + 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀. + Outline Introduction Architecture of Hadoop HDFS MapReduce Comparison Why Hadoop Conclusion 2 100062108

+Why Hadoop?commercially

100062108 李智宇、 100062116 林威宏、 100062220 施閔耀

24

Page 25: + 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀. + Outline Introduction Architecture of Hadoop HDFS MapReduce Comparison Why Hadoop Conclusion 2 100062108

+Why Hadoop

Cheap (Buy more servers to beat DBMS)

Flexible (Both in design and deployment)

Easier to design

Easier to scale up

Combine with other system to achieve better performance

commercially

100062108 李智宇、 100062116 林威宏、 100062220 施閔耀

25

Page 26: + 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀. + Outline Introduction Architecture of Hadoop HDFS MapReduce Comparison Why Hadoop Conclusion 2 100062108

+ConclusionHadoop is much easier for users to

implement and more economic

MapReduce advocates should study the techniques used in parallel DBMSs

Hybrid systems are also popular

With improvement of performance, we believe Hadoop will lead the trend of big data computing

100062108 李智宇、 100062116 林威宏、 100062220 施閔耀

26

Page 27: + 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀. + Outline Introduction Architecture of Hadoop HDFS MapReduce Comparison Why Hadoop Conclusion 2 100062108

+Reference http://hadoop.apache.org/

http://www.runpc.com.tw/content/cloud_content.aspx?id=105318

http://en.wikipedia.org/wiki/Apache_Hadoo

https://www.facebookbrand.com/

http://assets.fontsinuse.com/static/use-media-items/15/14246/full-2048x768/522903b7/Yahoo_Logo.png

http://wiki.apache.org/hadoop/PoweredBy

http://semiaccurate.com/assets/uploads/2011/09/Amazon-logo.jpg

http://www.conceptcupboard.com/blog/wp-content/uploads/2013/09/google.jpg

27

100062108 李智宇、 100062116 林威宏、 100062220 施閔耀

Page 28: + 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀. + Outline Introduction Architecture of Hadoop HDFS MapReduce Comparison Why Hadoop Conclusion 2 100062108

+Reference http://datashieldcorp.com/files/2013/11/adobe-LOGO-2.jpg

http://upload.wikimedia.org/wikipedia/commons/7/77/The_New_York_Times_logo.png

http://i.dell.com/sites/content/business/solutions/whitepapers/en/Documents/hadoop-introduction.pdf

http://hadoop.intel.com/pdfs/IntelDistributionReferenceArchitecture.pdf

http://www.google.com.tw/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&ved=0CDQQFjAB&url=http%3A%2F%2Fwww.classcloud.org%2Fcloud%2Fraw-attachment%2Fwiki%2FHinet100402%2F02.HadoopOverview.pdf&ei=IE2XUtLfBMfxiAea_oHQCA&usg=AFQjCNFoIXxLJrOnoul4cKJpQ8v3_kuTYg

28

100062108 李智宇、 100062116 林威宏、 100062220 施閔耀

Page 29: + 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀. + Outline Introduction Architecture of Hadoop HDFS MapReduce Comparison Why Hadoop Conclusion 2 100062108

+Reference http://www.accenture.com/SiteCollectionDocuments/PDF/

Accenture-Hadoop-Deployment-Comparison-Study.pdf

https://www.google.com.tw/url?sa=t&rct=j&q&esrc=s&source=web&cd=1&ved=0CCkQFjAA&url=http%3A%2F%2Fwww.psgtech.edu%2Fyrgcc%2Fattach%2FMAP%2520REDUCE%2520PROGRAMMING.ppt&ei=7lGXUtvCJsy5iAfWtYH4Bw&usg=AFQjCNGWRKJLal-tvbvORULZV6_Te2y74g&sig2=Ba77ihsV1SEqcNeEFkRzfg

https://www.cs.duke.edu/starfish/files/hadoop-models.pdf

http://dotnetmis91.blogspot.tw/2010/04/hdfs-hadoop-mapreduce.html

http://wiki.apache.org/hadoop/HDFS

http://www.ewdna.com/2013/04/Hadoop-HDFS-Comics.html

29

100062108 李智宇、 100062116 林威宏、 100062220 施閔耀

Page 30: + 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀. + Outline Introduction Architecture of Hadoop HDFS MapReduce Comparison Why Hadoop Conclusion 2 100062108

+Reference http://en.wikipedia.org/wiki/Interpreted_language

A Comparison of Approaches to Large-Scale Data Analysis by Sam Madden

http://www.cc.ntu.edu.tw/chinese/epaper/0011/20091220_1106.htm

http://web.cs.wpi.edu/~cs561/s12/Lectures/6/Hadoop.pdf

http://www.mobilemartin.com/mobile/show-me-the-mobile-money.jpg

100062108 李智宇、 100062116 林威宏、 100062220 施閔耀

30