mining the social web ch3
TRANSCRIPT
Mining The Social Web
NAVER 아키텍트를 꿈꾸는 사람들
발표 : 김연기
Mail Boxes
누가 메일을 보내나?
답장을 받는 시간대가 있나?
누가 자주 메일을 보내나?
요즘 핫이슈는??
Mbox From [email protected] Fri Dec 25 00:06:42 2009 Message-ID: <[email protected]> References: <[email protected]> In-Reply-To: <[email protected]> Date: Fri, 25 Dec 2001 00:06:42 -0000 (GMT) From: St. Nick <[email protected]> To: [email protected] Subject: RE: FWD: Tonight Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sounds good. See you at the usual location. Thanks, -S -----Original Message----- From: Rudolph Sent: Friday, December 25, 2009 12:04 AM To: Claus, Santa Subject: FWD: Tonight Santa - Running a bit late. Will come grab you shortly. Standby. Rudy Begin forwarded message: > Last batch of toys was just loaded onto sleigh.
> > Please proceed per the norm. > > Regards, > Buddy > > -- > Buddy the Elf > Chief Elf > Workshop Operations > North Pole > [email protected] From [email protected] Fri Dec 25 00:03:34 2009 Message-ID: <[email protected]> Date: Fri, 25 Dec 2001 00:03:34 -0000 (GMT) From: Buddy <[email protected]> To: [email protected] Subject: Tonight Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Last batch of toys was just loaded onto sleigh. Please proceed per the norm. Regards, Buddy -- Buddy the Elf Chief Elf Workshop Operations North Pole [email protected]
Mbox From [email protected] Fri Dec 25 00:06:42 2009 Message-ID: <[email protected]> References: <[email protected]> In-Reply-To: <[email protected]> Date: Fri, 25 Dec 2001 00:06:42 -0000 (GMT) From: St. Nick <[email protected]> To: [email protected] Subject: RE: FWD: Tonight Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sounds good. See you at the usual location. Thanks, -S -----Original Message----- From: Rudolph Sent: Friday, December 25, 2009 12:04 AM To: Claus, Santa Subject: FWD: Tonight Santa - Running a bit late. Will come grab you shortly. Standby. Rudy Begin forwarded message: > Last batch of toys was just loaded onto sleigh.
> > Please proceed per the norm. > > Regards, > Buddy > > -- > Buddy the Elf > Chief Elf > Workshop Operations > North Pole > [email protected] From [email protected] Fri Dec 25 00:03:34 2009 Message-ID: <[email protected]> Date: Fri, 25 Dec 2001 00:03:34 -0000 (GMT) From: Buddy <[email protected]> To: [email protected] Subject: Tonight Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Last batch of toys was just loaded onto sleigh. Please proceed per the norm. Regards, Buddy -- Buddy the Elf Chief Elf Workshop Operations North Pole [email protected]
Mbox From [email protected] Fri Dec 25 00:06:42 2009 Message-ID: <[email protected]> References: <[email protected]> In-Reply-To: <[email protected]> Date: Fri, 25 Dec 2001 00:06:42 -0000 (GMT) From: St. Nick <[email protected]> To: [email protected] Subject: RE: FWD: Tonight Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sounds good. See you at the usual location. Thanks, -S -----Original Message----- From: Rudolph Sent: Friday, December 25, 2009 12:04 AM To: Claus, Santa Subject: FWD: Tonight Santa - Running a bit late. Will come grab you shortly. Standby. Rudy Begin forwarded message: > Last batch of toys was just loaded onto sleigh.
> > Please proceed per the norm. > > Regards, > Buddy > > -- > Buddy the Elf > Chief Elf > Workshop Operations > North Pole > [email protected] From [email protected] Fri Dec 25 00:03:34 2009 Message-ID: <[email protected]> Date: Fri, 25 Dec 2001 00:03:34 -0000 (GMT) From: Buddy <[email protected]> To: [email protected] Subject: Tonight Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Last batch of toys was just loaded onto sleigh. Please proceed per the norm. Regards, Buddy -- Buddy the Elf Chief Elf Workshop Operations North Pole [email protected]
Mbox { "From": "St. Nick <[email protected]>", "Content-Transfer-Encoding": "7bit", "To": [ "[email protected]" ], "parts": [ { "content": "Sounds good. See you at the usual location.\n\nThanks,...", "contentType": "text/plain" } ], "References": "<[email protected]>", "Mime-Version": "1.0", "In-Reply-To": "<[email protected]>", "Date": "Fri, 25 Dec 2001 00:06:42 -0000 (GMT)", "Message-ID": "<[email protected]>", "Content-Type": "text/plain; charset=us-ascii", "Subject": "RE: FWD: Tonight" }, { "From": "Buddy <[email protected]>", "Content-Transfer-Encoding": "7bit", "To": [ "[email protected]" ], "parts": [ { "content": "Last batch of toys was just loaded onto sleigh. \n\n...", "contentType": "text/plain" } ], "Mime-Version": "1.0", "Date": "Fri, 25 Dec 2001 00:03:34 -0000 (GMT)", "Message-ID": "<[email protected]>", "Content-Type": "text/plain; charset=us-ascii", "Subject": "Tonight" } ]
Mbox + couchDB
DB 에 저장하여 통계를낼수 있다.
Json API를 제공
couchDB
문서 기반 DB Server
Json API를 제공
Views
Schema-Free
couchDB
Install couchdb on centOS yum install couchdb /etc/init.d/couchdb start
couchDB -+ Python
Install Couchdb Kit (On CentOS) curl -O http://peak.telecommunity.com/dist/ez_setup.py http://pypi.python.org/pypi/setuptools#rpm-based-systems $ sudo python ez_setup.py -U setuptools
Python – Couchdb API http://packages.python.org/CouchDB
couchDB -+ Python
{# -*- coding: utf-8 -*- import sys import os import couchdb try: import jsonlib2 as json except ImportError: import json JSON_MBOX = sys.argv[1] # i.e. enron.mbox.json DB = os.path.basename(JSON_MBOX).split('.')[0] server = couchdbkit.Server('http://localhost:5984') db = server.create(DB) docs = json.loads(open(JSON_MBOX).read()) db.update(docs, all_or_nothing=True)
couchDB - Views
def dateTimeToDocMapper(doc): # Note that you need to include imports used by your mapper # inside the function definition from dateutil.parser import parse from datetime import datetime as dt if doc.get('Date'): # [year, month, day, hour, min, sec] _date = list(dt.timetuple(parse(doc['Date']))[:-3]) yield (_date, doc) # Specify an index to back the query. Note that the index won't be # created until the first time the query is run view = ViewDefinition('index', 'by_date_time', dateTimeToDocMapper, language='python') view.sync(db)
couchDB – Map/Reduce
def dateTimeCountMapper(doc): from dateutil.parser import parse from datetime import datetime as dt if doc.get('Date'): _date = list(dt.timetuple(parse(doc['Date']))[:-3]) yield (_date, 1) def summingReducer(keys, values, rereduce): return sum(values) view = ViewDefinition('index', 'doc_count_by_date_time', dateTimeCountMapper, reduce_fun=summingReducer, language='python') view.sync(db)
couchDB – Lucene
JAVA 기반의 검색 엔진 Library
Look Who’s Talking
검색어에 해당하는 메시지 ID를 couchdb-lucene 에 질의.
메시지 ID가 있는 모든 메일을 찾는다.
메일중에서 메시지가 있는 메일의 유니크한 메일 주소를 찾아 낸다.
Look Who’s Talking
Look Who’s Talking
Look Who’s Talking
Look Who’s Talking
Look Who’s Talking
Analyzing Mail Data
Getmail
Poplib
Imaplib
Graph Your Inbox
Google Chrome Extension