mining the social web ch3

21
Mining The Social Web NAVER 아키텍트를 꿈꾸는 사람들 발표 : 김연기

Upload: scor7910

Post on 08-Jul-2015

249 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Mining the social web ch3

Mining The Social Web

NAVER 아키텍트를 꿈꾸는 사람들

발표 : 김연기

Page 2: Mining the social web ch3

Mail Boxes

누가 메일을 보내나?

답장을 받는 시간대가 있나?

누가 자주 메일을 보내나?

요즘 핫이슈는??

Page 3: Mining the social web ch3

Mbox From [email protected] Fri Dec 25 00:06:42 2009 Message-ID: <[email protected]> References: <[email protected]> In-Reply-To: <[email protected]> Date: Fri, 25 Dec 2001 00:06:42 -0000 (GMT) From: St. Nick <[email protected]> To: [email protected] Subject: RE: FWD: Tonight Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sounds good. See you at the usual location. Thanks, -S -----Original Message----- From: Rudolph Sent: Friday, December 25, 2009 12:04 AM To: Claus, Santa Subject: FWD: Tonight Santa - Running a bit late. Will come grab you shortly. Standby. Rudy Begin forwarded message: > Last batch of toys was just loaded onto sleigh.

> > Please proceed per the norm. > > Regards, > Buddy > > -- > Buddy the Elf > Chief Elf > Workshop Operations > North Pole > [email protected] From [email protected] Fri Dec 25 00:03:34 2009 Message-ID: <[email protected]> Date: Fri, 25 Dec 2001 00:03:34 -0000 (GMT) From: Buddy <[email protected]> To: [email protected] Subject: Tonight Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Last batch of toys was just loaded onto sleigh. Please proceed per the norm. Regards, Buddy -- Buddy the Elf Chief Elf Workshop Operations North Pole [email protected]

Page 4: Mining the social web ch3

Mbox From [email protected] Fri Dec 25 00:06:42 2009 Message-ID: <[email protected]> References: <[email protected]> In-Reply-To: <[email protected]> Date: Fri, 25 Dec 2001 00:06:42 -0000 (GMT) From: St. Nick <[email protected]> To: [email protected] Subject: RE: FWD: Tonight Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sounds good. See you at the usual location. Thanks, -S -----Original Message----- From: Rudolph Sent: Friday, December 25, 2009 12:04 AM To: Claus, Santa Subject: FWD: Tonight Santa - Running a bit late. Will come grab you shortly. Standby. Rudy Begin forwarded message: > Last batch of toys was just loaded onto sleigh.

> > Please proceed per the norm. > > Regards, > Buddy > > -- > Buddy the Elf > Chief Elf > Workshop Operations > North Pole > [email protected] From [email protected] Fri Dec 25 00:03:34 2009 Message-ID: <[email protected]> Date: Fri, 25 Dec 2001 00:03:34 -0000 (GMT) From: Buddy <[email protected]> To: [email protected] Subject: Tonight Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Last batch of toys was just loaded onto sleigh. Please proceed per the norm. Regards, Buddy -- Buddy the Elf Chief Elf Workshop Operations North Pole [email protected]

Page 5: Mining the social web ch3

Mbox From [email protected] Fri Dec 25 00:06:42 2009 Message-ID: <[email protected]> References: <[email protected]> In-Reply-To: <[email protected]> Date: Fri, 25 Dec 2001 00:06:42 -0000 (GMT) From: St. Nick <[email protected]> To: [email protected] Subject: RE: FWD: Tonight Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sounds good. See you at the usual location. Thanks, -S -----Original Message----- From: Rudolph Sent: Friday, December 25, 2009 12:04 AM To: Claus, Santa Subject: FWD: Tonight Santa - Running a bit late. Will come grab you shortly. Standby. Rudy Begin forwarded message: > Last batch of toys was just loaded onto sleigh.

> > Please proceed per the norm. > > Regards, > Buddy > > -- > Buddy the Elf > Chief Elf > Workshop Operations > North Pole > [email protected] From [email protected] Fri Dec 25 00:03:34 2009 Message-ID: <[email protected]> Date: Fri, 25 Dec 2001 00:03:34 -0000 (GMT) From: Buddy <[email protected]> To: [email protected] Subject: Tonight Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Last batch of toys was just loaded onto sleigh. Please proceed per the norm. Regards, Buddy -- Buddy the Elf Chief Elf Workshop Operations North Pole [email protected]

Page 6: Mining the social web ch3

Mbox { "From": "St. Nick <[email protected]>", "Content-Transfer-Encoding": "7bit", "To": [ "[email protected]" ], "parts": [ { "content": "Sounds good. See you at the usual location.\n\nThanks,...", "contentType": "text/plain" } ], "References": "<[email protected]>", "Mime-Version": "1.0", "In-Reply-To": "<[email protected]>", "Date": "Fri, 25 Dec 2001 00:06:42 -0000 (GMT)", "Message-ID": "<[email protected]>", "Content-Type": "text/plain; charset=us-ascii", "Subject": "RE: FWD: Tonight" }, { "From": "Buddy <[email protected]>", "Content-Transfer-Encoding": "7bit", "To": [ "[email protected]" ], "parts": [ { "content": "Last batch of toys was just loaded onto sleigh. \n\n...", "contentType": "text/plain" } ], "Mime-Version": "1.0", "Date": "Fri, 25 Dec 2001 00:03:34 -0000 (GMT)", "Message-ID": "<[email protected]>", "Content-Type": "text/plain; charset=us-ascii", "Subject": "Tonight" } ]

Page 7: Mining the social web ch3

Mbox + couchDB

DB 에 저장하여 통계를낼수 있다.

Json API를 제공

Page 8: Mining the social web ch3

couchDB

문서 기반 DB Server

Json API를 제공

Views

Schema-Free

Page 9: Mining the social web ch3

couchDB

Install couchdb on centOS yum install couchdb /etc/init.d/couchdb start

Page 10: Mining the social web ch3

couchDB -+ Python

Install Couchdb Kit (On CentOS) curl -O http://peak.telecommunity.com/dist/ez_setup.py http://pypi.python.org/pypi/setuptools#rpm-based-systems $ sudo python ez_setup.py -U setuptools

Python – Couchdb API http://packages.python.org/CouchDB

Page 11: Mining the social web ch3

couchDB -+ Python

{# -*- coding: utf-8 -*- import sys import os import couchdb try: import jsonlib2 as json except ImportError: import json JSON_MBOX = sys.argv[1] # i.e. enron.mbox.json DB = os.path.basename(JSON_MBOX).split('.')[0] server = couchdbkit.Server('http://localhost:5984') db = server.create(DB) docs = json.loads(open(JSON_MBOX).read()) db.update(docs, all_or_nothing=True)

Page 12: Mining the social web ch3

couchDB - Views

def dateTimeToDocMapper(doc): # Note that you need to include imports used by your mapper # inside the function definition from dateutil.parser import parse from datetime import datetime as dt if doc.get('Date'): # [year, month, day, hour, min, sec] _date = list(dt.timetuple(parse(doc['Date']))[:-3]) yield (_date, doc) # Specify an index to back the query. Note that the index won't be # created until the first time the query is run view = ViewDefinition('index', 'by_date_time', dateTimeToDocMapper, language='python') view.sync(db)

Page 13: Mining the social web ch3

couchDB – Map/Reduce

def dateTimeCountMapper(doc): from dateutil.parser import parse from datetime import datetime as dt if doc.get('Date'): _date = list(dt.timetuple(parse(doc['Date']))[:-3]) yield (_date, 1) def summingReducer(keys, values, rereduce): return sum(values) view = ViewDefinition('index', 'doc_count_by_date_time', dateTimeCountMapper, reduce_fun=summingReducer, language='python') view.sync(db)

Page 14: Mining the social web ch3

couchDB – Lucene

JAVA 기반의 검색 엔진 Library

Page 15: Mining the social web ch3

Look Who’s Talking

검색어에 해당하는 메시지 ID를 couchdb-lucene 에 질의.

메시지 ID가 있는 모든 메일을 찾는다.

메일중에서 메시지가 있는 메일의 유니크한 메일 주소를 찾아 낸다.

Page 16: Mining the social web ch3

Look Who’s Talking

Page 17: Mining the social web ch3

Look Who’s Talking

Page 18: Mining the social web ch3

Look Who’s Talking

Page 19: Mining the social web ch3

Look Who’s Talking

Page 20: Mining the social web ch3

Look Who’s Talking