hadoopデータプラットフォーム #cwt2013

63
1 Hadoopデータプラットフォーム 2013/11/07 Cloudera株式会社 嶋内 翔

Upload: cloudera-japan

Post on 28-May-2015

1.601 views

Category:

Documents


5 download

DESCRIPTION

#cwt2013 Clouderaの嶋内 @shiumachi によるビッグデータプラットフォームの構築・運用についてのスライドを公開しました。Hiveをどう扱うかという話から、チームサイズ別の運用方法まで紹介しています

TRANSCRIPT

  • 1. Hadoop 2013/11/07 Cloudera 1

2. ( ) 20114Cloudera email:[email protected] twi?er:@shiumachi 2 3. Hadoop 311 (?) Cloudera Eric Sammer 4. Cloudera Impala PDF Cloudera John Russell HadoopHBaseHadoop Hive Cloudera Cloudera World Tokyo 4 5. Hadoop Hive/Impala ClouderaSearch 5 6. Hadoop6 7. Hadoop APIAPIBI +JDBC/ODBC Web SQLHadoop RDBMS7 DWH 8. Hadoop APIAPIWeb Hadoop BI +JDBC/ODBC SQL RDBMS8 DWH 9. Hadoop APIAPIWeb Hadoop BI +JDBC/ODBC SQL ETLBIDWH RDBMS9 DWH 10. Hadoop APIAPIBI +JDBC/ODBC Web SQLHadoop RDBMS10 DWH 11. 11 12. APIAPIBI +JDBC/ODBC Web SQLHadoop RDBMS12 DWH 13. Flume FlumeHDFS HDFSFlume Application Application Application Server Server Server Network SwitchNetwork SwitchNetwork Router13Network SwitchNetwork RouterNetwork Router 14. Flume WebFlumeFlumeFlumeFlume syslog 14HadoopDWH 15. Sqoop RDBMS/DWH MapReduceDWHDWHHadoop RDBMS16 RDBMS 16. HBase RESTAPI Flume CSVCSV Flume17API 17. HDFS HDFS FTPput/getzip CDH4RESTAPI CDH5NFS Hadoop18 18. HueWeb zip19 19. Hive/Impala20 20. HiveClouderaImpala Hive SQL MapReduceCloudera Impala MapReduce SELECT customer.id, customer.name, sum(order.cost) FROM customer INNER JOIN order ON (customer.id = order.customer_id) WHERE customer.zipcode = '63105 GROUP BY customer.id, customer.name; 21 21. Hive/ImpalajsonBIAadacsequencele B22be 22. Hive/Impalajson Asequencele B23BI Hadoop a d a etc c eb 23. Hive/ImpalajsonBIAacsequencele B24SerDe a d (Hive)be 24. Hive AHadoop CSV25B 25. Hive CREATEEXTERNALTABLEtweets( idBIGINT, created_atSTRING, favoritedBOOLEAN, retweet_countINT, retweeted_statusSTRUCT< text:STRING, user:STRUCT> ) PARTITIONEDBY(datehourINT) ROWFORMATSERDE'com.cloudera.hive.serde.JSONSerDe' LOCATION'/user/ume/tweets' 26 26. Hive CREATEEXTERNALTABLEtweets( idBIGINT, created_atSTRING, favoritedBOOLEAN, retweet_countINT, retweeted_statusSTRUCT< text:STRING, user:STRUCT> ) PARTITIONEDBY(datehourINT) ROWFORMATSERDE'com.cloudera.hive.serde.JSONSerDe' LOCATION'/user/ume/tweets' 27 27. Hive AMapReduceImpalaCSV CSV CSV /user/sho/super_cool_web_service/access_log 28 28. Hive AMapReduceImpala CSV CSV CSV /user/sho/super_cool_web_service/access_log 29 29. Hive CREATEEXTERNALTABLEtweets( idBIGINT, created_atSTRING, favoritedBOOLEAN, retweet_countINT, retweeted_statusSTRUCT< text:STRING, SerDe user:STRUCT> ) PARTITIONEDBY(datehourINT) ROWFORMATSERDE'com.cloudera.hive.serde.JSONSerDe' LOCATION'/user/ume/tweets' 30 30. SerDe Serializer/Deserializer Hive SerDe SerDe 31RegexSerDeJava 31. NN 32SequenceFile Hive/Impala RCFile/Parquet gzip snappy 32. : 33 ( ) 33. :SequenceFile(BLOCK)+gzip CREATE TABLE seq_table (id INT, name STRING, ) STORED AS SEQUENCEFILE;set mapred.output.compression.type = BLOCK; set hive.exec.compress.output = true; set mapred.output.compression.codec = org.apache.hadoop.io.compress.GzipCodec; INSERT INTO seq_table SELECT * FROM raw_table; 34 34. :SequenceFile(BLOCK)+gzip +S (id INT, name STRING, ) CREATE TABLE seq_table erDe STORED AS SEQUENCEFILE;set mapred.output.compression.type = BLOCK; SequenceFile set hive.exec.compress.output = true; set mapred.output.compression.codec = org.apache.hadoop.io.compress.GzipCodec; INSERT INTO seq_table SELECT * FROM raw_table; 35 35. Hive/ImpalajsonBIAadacsequencele B36be 36. Hadoop(1) Hadoop Hive =MapReduce 38 Hadoop 37. Hadoop(1) ()SELECT FROM (SELECT FROM (SELECT FROM SELECT FROM tmp_tableA A A 38. Hive/ImpalajsonBIHadoop A PC a 40dcsequencele Babe 39. Hadoop(2) Hadoop Hive/Impala 1GBPC RPython Excel 41Hadoop 40. Oozie Oozie jsonBIAadacsequencele B42be 41. Hue Oozie Oozie ()43 42. ClouderaSearch44 43. ClouderaSearchjsonBIAadacsequencele B45be 44. ClouderaSearch UIjsonBIAadacsequencele B46be 45. ClouderaSearch HadoopSolr 47 46. 48 47. ClouderaManager ClouderaManager:Cloudera 49 48. Hadoop 50(CPU+) 49. : : () 51 cgroup YARN ClouderaManager5 50. 52ClouderaManager 51. 53 Kerberos 52. 54HDFSHive Kerberos ApacheSentry ClouderaNavigator 53. ClouderaNavigator55 54. 56 55. B Aa 57bB 56. B Aab 58B 57. a 59bA 58. ab 60A 59. 61 60. Hadoop APIAPIBI +JDBC/ODBC Web SQLHadoop RDBMS62 DWH 61. WeareHiring! Cloudera Hadoop Hadoop () [email protected] 63 62. WeareHiring!64 63. 65