hive tools in nhn japan #hadoopreading

26
Hive Tools in NHN Japan Hadoop Source Code Reading Vol.9 2012/05/30 @tagomoris (TAGOMORI Satoshi) 12530日水曜日

Upload: satoshi-tagomori

Post on 15-Jan-2015

3.962 views

Category:

Technology


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Hive Tools in NHN Japan #hadoopreading

Hive Tools in NHN JapanHadoop Source Code Reading Vol.92012/05/30@tagomoris (TAGOMORI Satoshi)

12年5月30日水曜日

Page 2: Hive Tools in NHN Japan #hadoopreading

@tagomorisNHN Japan Corp

Web Service Division

12年5月30日水曜日

Page 3: Hive Tools in NHN Japan #hadoopreading

Hive in NHN Japan

Reporting of access log (not analysis)

Pageviews and/or Unique Users?

Accesses under specified condition?

Hey, what numbers of accesses for our new features?

new bot accesses? any troubles?

12年5月30日水曜日

Page 4: Hive Tools in NHN Japan #hadoopreading

SELECT yyyymmdd, count(is_pc(pa)) as pc, count(is_smartphone(pa)) as smartphone, count(is_mobilephone(pa)) as mobilephoneFROM ( SELECT yyyymmdd, parse_agent(agent) as pa FROM access_log WHERE service='__SERVICE__' AND (yyyymmdd='__1DAYS_AGO__' OR yyyymmdd='__2DAYS_AGO__') AND NOT flag ) xGROUP BY yyyymmddORDER BY yyyymmdd LIMIT 2

12年5月30日水曜日

Page 5: Hive Tools in NHN Japan #hadoopreading

12年5月30日水曜日

Page 6: Hive Tools in NHN Japan #hadoopreading

Today's topic

For Fluentd,See 'Software Design'

2012/06

12年5月30日水曜日

Page 7: Hive Tools in NHN Japan #hadoopreading

Hadoop / HDFS

FluentdCluster

stream

Hoop Server (HttpFs)

stream

backup stream

realtime monitoring

Hive Server

Users(Web Browser)

Shib(Hive Client Web Application)

ShibUI(Query Management System)

12年5月30日水曜日

Page 8: Hive Tools in NHN Japan #hadoopreading

Why Hive?Handmade MapReduce: Noooooooooooooooo

Pig? Hive?

All we loves 'xQL' like 'SQL'...

FORCE to throw away all queries

"処理を書き捨てる勇気"

We are likely to maintain 'programs' (like pig script)

With chainging data, BAD to maintain how to handle data

12年5月30日水曜日

Page 9: Hive Tools in NHN Japan #hadoopreading

Client Tools?

'hive' command sucks

Hue (Beeswax for Hive)?

we want end-users to run 'SELECT' only.

we want HTTP API to work with another systems

Periodic query execution, and graph plotting

Miscellaneous extensions we want (and ease to write)

12年5月30日水曜日

Page 10: Hive Tools in NHN Japan #hadoopreading

Copy&Paste Based Query Management

Non-refered QueriesMUST DIE

12年5月30日水曜日

Page 11: Hive Tools in NHN Japan #hadoopreading

Hadoop / HDFS

FluentdCluster

stream

Hoop Server (HttpFs)

stream

backup stream

realtime monitoring

Hive Server

Users(Web Browser)

Shib(Hive Client Web Application)

ShibUI(Query Management System)

12年5月30日水曜日

Page 12: Hive Tools in NHN Japan #hadoopreading

Hadoop / HDFS

FluentdCluster

stream

Hoop Server (HttpFs)

stream

backup stream

realtime monitoring

Hive Server

Users(Web Browser)

Shib(Hive Client Web Application)

ShibUI(Query Management System)

12年5月30日水曜日

Page 13: Hive Tools in NHN Japan #hadoopreading

Shibhttps://github.com/tagomoris/shib

Hive Client Web Application

Run 'SELECT' queries only

Store results of queries

Provides HTTP API:

to run queries

to get result data of queries

12年5月30日水曜日

Page 14: Hive Tools in NHN Japan #hadoopreading

Hadoop / HDFS

FluentdCluster

stream

Hoop Server (HttpFs)

stream

backup stream

realtime monitoring

Hive Server

Users(Web Browser)

Shib(Hive Client Web Application)

ShibUI(Query Management System)

12年5月30日水曜日

Page 15: Hive Tools in NHN Japan #hadoopreading

Hadoop / HDFS

Hive Server

Users(Web Browser)

Shib (node.js)

Thrift

HTTP/Ajax

DataStore (Kyoto Tycoon)

12年5月30日水曜日

Page 16: Hive Tools in NHN Japan #hadoopreading

12年5月30日水曜日

Page 17: Hive Tools in NHN Japan #hadoopreading

ShibUI(non-disclosured application)

Web Front-end of Shib

Daily/Weekly/Monthly Query Management System

Graph plotting of query results

Record log to check queries no one views...

Query Builder (for hive-unfriendly engineers/directors)

(Under construction)

12年5月30日水曜日

Page 18: Hive Tools in NHN Japan #hadoopreading

Hadoop / HDFS

FluentdCluster

stream

Hoop Server (HttpFs)

stream

backup stream

realtime monitoring

Hive Server

Users(Web Browser)

Shib(Hive Client Web Application)

ShibUI(Query Management System)

12年5月30日水曜日

Page 19: Hive Tools in NHN Japan #hadoopreading

Hadoop / HDFS

FluentdCluster

stream

Hoop Server (HttpFs)

stream

backup stream

realtime monitoring

Hive Server

Users(Web Browser)

Shib(Hive Client Web Application)

ShibUI(Query Management System)

12年5月30日水曜日

Page 20: Hive Tools in NHN Japan #hadoopreading

Hadoop / HDFS

Hive Server

Users(Web Browser)

Shib (node.js)

HTTP/Ajax ShibUI(Perl/Plack Web Application: Kossy)

MySQLHRForecast

HTTP

12年5月30日水曜日

Page 21: Hive Tools in NHN Japan #hadoopreading

12年5月30日水曜日

Page 22: Hive Tools in NHN Japan #hadoopreading

12年5月30日水曜日

Page 23: Hive Tools in NHN Japan #hadoopreading

12年5月30日水曜日

Page 24: Hive Tools in NHN Japan #hadoopreading

What to do nextMapReduce Job management

check query to run correctly

kill queries

Huahin Manager by @ryu_kobayashi

Hadoop MapReduce Job Manager over HTTP

http://huahin.github.com/huahin-manager/

Shib version up

node.js 0.4 based -> 0.6 based12年5月30日水曜日

Page 25: Hive Tools in NHN Japan #hadoopreading

Questions?

12年5月30日水曜日

Page 26: Hive Tools in NHN Japan #hadoopreading

Thanks!

12年5月30日水曜日