engineer event in kyoto 10-04-17

67
アグレッシブな クラウドの使い方 クックパッド株式会社 勝間 亮 COOKPADの開発の裏側見せます in 京都

Upload: ryo-katsuma

Post on 18-Jul-2015

3.990 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Engineer event in Kyoto 10-04-17

アグレッシブなクラウドの使い方

クックパッド株式会社勝間 亮

COOKPADの開発の裏側見せます in 京都

Page 2: Engineer event in Kyoto 10-04-17

自己紹介

• 勝間 亮 (カツマ リョウ)• 2009.05 ~• サービス開発エンジニア• @ryo_katsuma

Page 3: Engineer event in Kyoto 10-04-17
Page 4: Engineer event in Kyoto 10-04-17

京都++ 

Page 5: Engineer event in Kyoto 10-04-17

本日のお品書き•なぜクラウド?•どうやって使ってるの?•何ができるの?•まとめ

Page 6: Engineer event in Kyoto 10-04-17

なぜクラウド?

Page 7: Engineer event in Kyoto 10-04-17

の、前に。

Page 8: Engineer event in Kyoto 10-04-17

クックパッドのものづくり

Page 9: Engineer event in Kyoto 10-04-17
Page 10: Engineer event in Kyoto 10-04-17
Page 11: Engineer event in Kyoto 10-04-17

ユーザの声を大事

Page 12: Engineer event in Kyoto 10-04-17

ユーザの声を大事鵜呑み

にしない!

Page 13: Engineer event in Kyoto 10-04-17

ユーザの欲求を正しく理解

何を実装すべきか?

Page 14: Engineer event in Kyoto 10-04-17

膨大な量のログから、ユーザの欲求を正しく理解

Page 15: Engineer event in Kyoto 10-04-17
Page 16: Engineer event in Kyoto 10-04-17

7000時間/年

Page 17: Engineer event in Kyoto 10-04-17

ユーザ理解に限界

Page 18: Engineer event in Kyoto 10-04-17

分散処理による解析環境の必要性

Page 19: Engineer event in Kyoto 10-04-17

クラウド

Page 20: Engineer event in Kyoto 10-04-17

クラウド

価値の高いサービスを実現する手段

Page 21: Engineer event in Kyoto 10-04-17

どうやって使ってるの?

Page 22: Engineer event in Kyoto 10-04-17

•EC2•S3•RDS

Page 23: Engineer event in Kyoto 10-04-17

•EC2•S3•RDS

データ変換Hadoop

バックアップHadoop用ストレージ Hive

Page 24: Engineer event in Kyoto 10-04-17

EC2

Page 25: Engineer event in Kyoto 10-04-17

•ログ変換•Hadoop クラスタ

用途

Page 26: Engineer event in Kyoto 10-04-17

•Cloudera AMI‣ Hadoopの設定ゼロ‣ CDH1‣ + s3cmd, 自作Utility tool‣ CDHバージョン上げたい。。

AMI

Page 27: Engineer event in Kyoto 10-04-17

•m1.large × 10 ~•起動数の上限 500‣ AWSの人、丁寧でいいかんじ

インスタンス数

Page 28: Engineer event in Kyoto 10-04-17

Elastic MapReduce

•デバッグしづらい•余分に費用•毎回クラスタ起動+終了•実用性は低い、と判断•

Page 29: Engineer event in Kyoto 10-04-17

Elastic MapReduce

•デバッグしづらい•余分に費用•毎回クラスタ起動+終了•実用性は低い、と判断•GUI(笑)

Page 30: Engineer event in Kyoto 10-04-17

EC2 cluster内部NW

通信

Page 31: Engineer event in Kyoto 10-04-17

EC2 cluster内部NW

hadoop-ec2 + ssh

標準出力

通信

Page 32: Engineer event in Kyoto 10-04-17

S3

Page 33: Engineer event in Kyoto 10-04-17

バックアップ

•PV, 広告ログ•cron + s3cmd

Page 34: Engineer event in Kyoto 10-04-17

Hadoopストレージ•HDFS‣ データロード時間がボトルネック

•hadoop-site.xml‣ key : fs.default.name ‣ val : s3://path/to/log

Page 35: Engineer event in Kyoto 10-04-17

bucket•用途ごとに区別‣ Backup

‣ Hadoop

‣ Report

•(ドメイン的な)予備

Page 36: Engineer event in Kyoto 10-04-17

RDS

Page 37: Engineer event in Kyoto 10-04-17

用途

•Hive(後述)のメタデータ保存•複数クラスタでメタデータ共有•small instance‣ パフォーマンス求めてない

Page 38: Engineer event in Kyoto 10-04-17

Hadoop

Page 39: Engineer event in Kyoto 10-04-17

Map Reduce

1.Hadoop Streaming‣ Ruby

2. Hive‣ SQL (like) wrapper

※ 直接Javaでの開発はなし

Page 40: Engineer event in Kyoto 10-04-17

Hadoop Streaming

#!/usr/bin/ruby

ARGF.each do |line| line.chomp! logs = line.split(/,/) unless logs.empty? printf("%s,1\n", logs[0]) endend

#!/usr/bin/ruby

h = Hash.newARGF.each do |line| line.chomp!

array = line.split(/,/) uid = array[0] c = array[1]

count = 1 if h.key? uid count = h[uid].to_i + 1 end

h.store uid, countend

printf("#{h.length}")

map.rb reduce.rb

•細かな制御が可能•Rubyのライブラリを利用

Page 41: Engineer event in Kyoto 10-04-17

Hive•MapReduceのSQLラッパ‣ Facebookが開発‣ Select文をタスク分割

•MySQLで可能なことはほぼOK‣ join‣ group by‣ order by

Page 42: Engineer event in Kyoto 10-04-17

#!/usr/bin/ruby

ARGF.each do |line| line.chomp! logs = line.split(/,/) unless logs.empty? printf("%s,1\n", logs[0]) endend

#!/usr/bin/ruby

h = Hash.newARGF.each do |line| line.chomp!

array = line.split(/,/) uid = array[0] c = array[1]

count = 1 if h.key? uid count = h[uid].to_i + 1 end

h.store uid, countend

printf("#{h.length}")

map.rb reduce.rb

select count(distinct user_id) from logs;

Page 43: Engineer event in Kyoto 10-04-17

240GBのcsvデータからselect

0

300

600

900

1200

1 3 5 10node

sec

Page 44: Engineer event in Kyoto 10-04-17

使い分け

•Hadoop Streaming ... レポーティング•Hive ... バッチ、オンデマンド

Page 45: Engineer event in Kyoto 10-04-17

Pigもキてる!

0

175

350

525

700

1020

30

Hive Pig

Page 46: Engineer event in Kyoto 10-04-17

何ができるようになったの?

Page 47: Engineer event in Kyoto 10-04-17

1. たべみる

Page 48: Engineer event in Kyoto 10-04-17

たべみる

•検索ログの解析•地域別、週別、月別

Page 49: Engineer event in Kyoto 10-04-17

たべみる

Page 50: Engineer event in Kyoto 10-04-17

~ 2009

7000時間

Page 51: Engineer event in Kyoto 10-04-17

~ 2009

7000時間

Hadoop Streamingで解析

Page 52: Engineer event in Kyoto 10-04-17

たべみる2009

Page 53: Engineer event in Kyoto 10-04-17

たべみる2009

30時間

Page 54: Engineer event in Kyoto 10-04-17

効果

•食材, 料理のトレンド•より速くトレンドを把握

Page 55: Engineer event in Kyoto 10-04-17

2. モバイルアクセス解析

Page 56: Engineer event in Kyoto 10-04-17

モバイルアクセス解析

•GA パフォーマンス悪い + 不安定•MySQLで自前で解析‣ daily PV, UUくらいしか取れなかった

Page 57: Engineer event in Kyoto 10-04-17

モバイルアクセス解析

•GA パフォーマンス悪い + 不安定•MySQLで自前で解析‣ daily PV, UUくらいしか取れなかった

Hiveでより多くのデータを解析

Page 58: Engineer event in Kyoto 10-04-17

効果

Page 59: Engineer event in Kyoto 10-04-17

効果

PV/UU (週次)

Page 60: Engineer event in Kyoto 10-04-17

効果

PV/UU (週次)

PV/UU (月次)

Page 61: Engineer event in Kyoto 10-04-17

効果

PV/UU (週次)

PV/UU (月次)

リピートユーザ数 (日次)

Page 62: Engineer event in Kyoto 10-04-17

•この新機能 or 改善には「効き」がある?•「ファン」になってくれている?

リピートユーザ数

Page 63: Engineer event in Kyoto 10-04-17

まとめ

Page 64: Engineer event in Kyoto 10-04-17

•AWSを利用してユーザの理解•Hadoopは親和性がAWSと高い• 今後もより高い価値のサービスを実現

Page 65: Engineer event in Kyoto 10-04-17

One more thing

Page 66: Engineer event in Kyoto 10-04-17

ご利用は計画的に

Page 67: Engineer event in Kyoto 10-04-17

ご静聴ありがとうございました

[素材参照] http://www.iconspedia.com