[rakutentechconf2013] [b-3_2] dwh/hadoop in rakuten ichiba

Post on 07-Nov-2014

952 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Rakuten Technology Conference 2013 "DWH/Hadoop in Rakuten Ichiba" Mitsuo Hangai (Rakuten)

TRANSCRIPT

DWH/Hadoop in Rakuten Ichiba

Vol.01   Oct/26/2013Mitsuo Hangai

Sendai Development GruopNew Service Development Department, Rakuten, Inc.http://www.rakuten.co.jp/

2

Self introduction

Mitsuo Hangai( 半谷 充生 )Rakuten, Inc. Service Development Sendai Group

@bangucs

汉语

3

Agenda

About Sendai Branch

What is our data ware-house

Now and future

4

About Sendai Branch

5

Do you know Sendai?

6

About Sendai

白地図、世界地図、日本地図が無料【白地図専門店】 http://www.freemap.jp/japan/ja_kouiki_japan_big_scale_3.html

Sendai City

7

About Sendai

8

About Sendai branch

9

History of Sendai Development Group

2007. Foundation for Pro-sports.

2008. Start Ichiba Business Support and Infoseek operations.

2009. Growing up and starting Advertisement development.

2010. Start Marriage operations.

2011.Hit by the huge earthquake… Move to new office.

2012. GM changed to Nanjo (He organizes Satellite!).

2007 2008 2009 2010 2011 2012 201305

1015202530

AdvertisementAuctionMarrigeInfoseekIchibaPro-sports

10

Current work of Rakuten Sendai

International IchibaDevelopment & Operation

Central Data WareHouseDevelopment & Operation

Development & Operation

System replacementDevelopment & Operation

Development & Operation

Our team!

11

About the usage of Rakuten Ichiba’s Data

12

How/what we use Rakuten Ichiba’s data

Accounting,Giving points

GMS(Gross Merchandise Sales)Reporting for 500 of EC

Consultants

Ranking IchibaPurchaseHistory

Find injustice

Marketing department

And so forth….

13

By the way….

Do you know ….

14

How many orders Rakuten Ichiba

receives per day?

15

A: About2,000,000

Transactions( order based, not items)※

16

How much dataDo our Data warehouse

handle per day?

17

A:About 100GB(this is not all, only needed)

18

How many Items

Does Rakuten Ichiba have?

19

A: about1,400,000,000

Items(2013/10/08 basis)

20

There are some Long and

Funny names of Items

21

http://item.rakuten.co.jp/wakamaru/sale-2908-50offcp/

This is the name of this item!!

22

http://item.rakuten.co.jp/pascoshop/4901820354426/

This is the name of this item!!

23

http://item.rakuten.co.jp/e-cha/hd-sakusakuwakame/

This is the name of this item!!

24

We have such huge(and

funny) data.

25

We must handle such huge data until morning…

26

Like this(1):

This table has about 200,000,000 records

27

Like this(2):

About2 meters

Each tables has about 200,000,000

records

28

How tough…But it is

necessary…

29

Few years ago(- May 2011)

Purchase

Shops

ITEM

RDB1

RDB2

FileFileFile

Scheduler Batch Server

SQLPe rl

Unlo

ad

load

Inte

rface

Old SelectDB

FileFileFile

FileFileFile

FileFileFileFileFileFile

FileFileFile

FileFileFile

107 tables378 interface files

30

Problem

RDBMS had problems such as:-Poor performance…-Lack of disk amount...-Difficult to enhance…-servers are expensive!!

31

Really poor…For example:

32

33

How do we solve it?

34

35

Sweet point of Hadoop

Good performance! As for batch processing, it acts extremely

good performance. Easy to enhance!

Just only add Data nodes. Do not need high performance

servers! Just only commodity servers, so we can

reduce costs!

36

Bitter point of Hadoop

MapReduce is not easy… We decided to use Hive(enable MapReduce

via SQL-like query language called HiveQL)

Hive has no “delete” and “insert into” clause, and HiveQL has many different from SQL… Need to consider before development, deeply

Hive has high latency… Only batch processing

37

Then we decided to use

Hadoop.

38

Rakuten’s Shared Hadoop Cluster

200915 nodes

50TB

201169 nodes

300TB

201330 nodes

1PB

RecommendRanking

Item data analysisBehavior analysis

RecommendRanking

Item data analysisBehavior analysisJapan Ichiba DWHAccess log analysis

Suggest

RecommendRanking

Item data analysisBehavior analysisJapan Ichiba DWHAccess log analysis

PersonalizeGranting Point

39

Plan:

FileFileFile

Scheduler Batch Server

HiveQLShell/Java

Unlo

ad

Inte

rface

New SelectDB(called Ichiba DWH)

FileFileFile

FileFileFile

FileFileFileFileFileFile

FileFileFile

FileFileFile

Purchase

Shops

ITEM

HadoopCluster

69nodes

load

107 tables378 interface files

40

We had to transfer data from old system to

Hadoop:

107 tables!!

We had to check all diff between old system

and new system:

378 files!!(=378HiveQL)

41

Project was2010-Oct

To 2011-May

42

http://www.pakutaso.com/20130900245post-3233.html

Can we Beat it?

43

Moreover...

44

2011- March

45

We were hit by a huge earthquake on March 11, 2011… the project was in the climax….

Hole at the wall…

46

But we did.

47

At temporary office(like Tako-beya)

48

2011- May

49

We releasedThe new Data warehouse!!

50

Hadoop was great

RDB1 VS

At result, total processing time basis:

161:29:38 99:54:39

Hadoop beat RDBMS40%!!!!

51

No problem at all!!!

52

Detail of architecture of our DWH

FileFileFile

Scheduler Batch Server(Client Node of Hadoop)

HiveQLPerl/Shell/Java

Unlo

ad

Inte

rface

FileFileFile

FileFileFile

Purchase

Shops

ITEM

Rakuten Shared Hadoop Cluster

load

Data Nodes

Job tracker/Name Node

FileFileFile

FileFileFile

FileFileFile

FileFileFile

ICHIBA DWH

53

Now and future

54

Total processing time: 99:54:39

Current situation of our DWH

ICHIBA DWHPurchase

Shops

ITEM

Review

Rakuten Shared Hadoop Cluster

Data Nodes

New!

Total processing time: 80:04:04!!

New!

http://model.foto.ne.jp/free/product_info.php/cPath/24_251_243/products_id/302131

Tables:202HiveQLs:701

It doubled!

Keeps Growing!!!

55

Future

ICHIBA DWHPurchase

Shops

ITEM

Review

Rakuten Shared Hadoop Cluster

Data Nodes

http://model.foto.ne.jp/free/product_info.php/cPath/24_251_243/products_id/302131

New!

New!Customer

Support tool

BI

New!

New!

56

We will expand our service and usage of data!!

Currently, we act like a platform team. But our mission is “analytics”.

We are going to focus on Analyzing data, more and more!

And we are going to expand and develop other services which use Rakuten Ichiba’s exciting data!

57

Exciting!!

http://www.pakutaso.com/20130926245post-3235.html

58

We are Waiting for

you!!

59

Join us!

60

Thank you for listening!

Contact me via:

@bangucsMitsuo.hangai

mitsuo.hangai@mail.rakuten.com

English is OK, of course 日本語でもおk

top related