如何建置關鍵字精靈 how to build an keyword wizard

36
如何建置關鍵字精靈 How to Build an Keyword Wizard

Upload: -

Post on 13-Jan-2017

478 views

Category:

Internet


0 download

TRANSCRIPT

Page 1: 如何建置關鍵字精靈 How to Build an Keyword Wizard

如何建置關鍵字精靈How to Build an Keyword

Wizard

Page 2: 如何建置關鍵字精靈 How to Build an Keyword Wizard

Agenda● What is Keyword ?● Why We Need ?● Word Relation & Word Representation● How to Build this Wizard● Live Demo

Page 3: 如何建置關鍵字精靈 How to Build an Keyword Wizard

What is Keyword ?

● Wikipedia : Keyword (computer programming), word or identifier that has a particular meaning to the programming language

Page 4: 如何建置關鍵字精靈 How to Build an Keyword Wizard

Why We Need ?

Advertisement Tags

Look Me !

Relation Article Summary

Page 5: 如何建置關鍵字精靈 How to Build an Keyword Wizard

Word Relation Model

琉球潛水沖繩潛水

沖繩機場那霸機場琉球機場

琉球浮潛沖繩浮潛

沖繩水族館琉球水族館OKinawa 水族館

沖繩

Page 6: 如何建置關鍵字精靈 How to Build an Keyword Wizard

Word Relation Model

沖繩

Page 7: 如何建置關鍵字精靈 How to Build an Keyword Wizard

Word Representation - Vector Space Model

Page 8: 如何建置關鍵字精靈 How to Build an Keyword Wizard

One Hot v.s Continue Value

It is better for analysis

Very High Dimension

Page 9: 如何建置關鍵字精靈 How to Build an Keyword Wizard

Word Representation - One Hot Representation

Word One Hot Index

Apple 00000001

how 00000010

Are 00000100

You 00001000

I 00010000

Am 00100000

Fine 01000000

Book 10000000

How Are You ? I am Fine . Thank You

TF - Term Frequency

01111110

00001000

00010000

AND

You

I

00000000

Page 10: 如何建置關鍵字精靈 How to Build an Keyword Wizard

Word Representation - Context VectorP(Wi|Context)

Word 餐廳 浮潛 美食 旅遊 出國

沖繩 0.1 0.7 0.5 0.9 0.5

好吃 0.6 0.01 0.7 0.01 0.02

Okinawa 0.2 0.5 0.2 0.8 0.7

喔伊西 0.3 0.002 0.8 0.02 0.03

Similar

Similar

Page 11: 如何建置關鍵字精靈 How to Build an Keyword Wizard

Word Context Vector

Page 12: 如何建置關鍵字精靈 How to Build an Keyword Wizard

Co-occurrence MatrixSparse & Large

n ~= 500K

Space ~= n*nTime ~= n*n

GG!!

Page 13: 如何建置關鍵字精靈 How to Build an Keyword Wizard

Word2Vec

使用類神網路來產生以下模型:給予短句中的前文即可預測出下一個可能會出現的詞

附帶產生的結果投影層即為詞向量(Word Vector)

https://www.tensorflow.org/versions/r0.8/tutorials/word2vec/index.html

我想要去沖繩潛水 潛水

打球

潛水

睡覺

洗臉

...

Page 14: 如何建置關鍵字精靈 How to Build an Keyword Wizard

Word2Vec● Google 2013 Release● Open Source Project● Two Layer Neural Network● Another Toolkit : Gensim● pip install --upgrade

gensim

https://www.tensorflow.org/versions/r0.8/tutorials/word2vec/index.html

Page 15: 如何建置關鍵字精靈 How to Build an Keyword Wizard

How to Hands On ???

Page 16: 如何建置關鍵字精靈 How to Build an Keyword Wizard

Major Process Flow

Word Collection

Content ExtractionArticle Selection Build ModelWord Cutting

花笠麵很好吃

花笠麵△很△好吃

Slack IntegrationSearch Log

Page 17: 如何建置關鍵字精靈 How to Build an Keyword Wizard

Article Selection

High Quality 500K Articles at 2015Q3Q4

4.4 Billion

SpamClassifier

Ranking

● pip install --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.8.0-cp27-none-linux_x86_64.whl

● pip install -U scikit-learn● http://www.wildml.com/2015/11/understanding-

convolutional-neural-networks-for-nlp/

Page 18: 如何建置關鍵字精靈 How to Build an Keyword Wizard

Content Extraction

Top

Content Body

Bottom

Side

Side

<div><p>沖繩哪裡好玩</p><p>美ら海水族館</p>

<div>

沖繩哪裡好玩美ら海水族館

● pip install beautifulsoup4

Page 19: 如何建置關鍵字精靈 How to Build an Keyword Wizard

Content Extraction

Page 20: 如何建置關鍵字精靈 How to Build an Keyword Wizard

Content Extraction

Page 21: 如何建置關鍵字精靈 How to Build an Keyword Wizard

Article Raw Data Preparation

A

A1A2A3A4A5A6A7A8A9B1B2B3B4B4B6B7B8B9

Z1Z2Z3Z4Z5Z6Z7Z8Z9

…..

A1 A2 A4 A5 A6 A7 A8 A9B1 B2 B3 B4 B6 B7 B8 B9

Z1 Z2 Z3 Z4 Z5 Z6 Z7 Z8 Z9

…..

Page 22: 如何建置關鍵字精靈 How to Build an Keyword Wizard

Build Model - Word2Vec

Page 23: 如何建置關鍵字精靈 How to Build an Keyword Wizard

Build Model - Word2Vec

Page 26: 如何建置關鍵字精靈 How to Build an Keyword Wizard

Term Database - Search Log

Term CollectionSearch History

Filter &

Counting

Page 27: 如何建置關鍵字精靈 How to Build an Keyword Wizard

Search Log

Keyword URL Date Click

好吃 http://xxx.xxx 20160520 33

好吃 http://zzz.zzz 20160520 22

日本旅遊 http://xxx.xxx 20160521 15

http://xxxx.xx.xxx http://xxxx.xx.xxx 20160522 12121

Page 28: 如何建置關鍵字精靈 How to Build an Keyword Wizard

Term Database - Search Log by Count

Page 29: 如何建置關鍵字精靈 How to Build an Keyword Wizard

Term Database - Search Log by Count/Len

Page 31: 如何建置關鍵字精靈 How to Build an Keyword Wizard

Word Cutting● Word Cut Tool

○ Jieba : https://github.com/fxsjy/jieba○ https://github.com/yanyiwu/cppjieba-serve

● C++ Jieba Server ↑ x 30 以上

● pip install jieba

Page 32: 如何建置關鍵字精靈 How to Build an Keyword Wizard

Slack Integration● Library

○ pip install slackbot○ pip install slacker

● Get Bot Token○ https://my.slack.com/services/new/bot

Page 33: 如何建置關鍵字精靈 How to Build an Keyword Wizard

NAS

Technology Software Stack

Redshift BigQuery Article DB

Spark

WorkerWorker Worker

Jieba Server

Gensim Word2Vec

Flask

Jupyter

ScikitLearn

TensorFlow

Slack Bot

Page 34: 如何建置關鍵字精靈 How to Build an Keyword Wizard

LIVE Demo

Page 35: 如何建置關鍵字精靈 How to Build an Keyword Wizard

Q&A

Page 36: 如何建置關鍵字精靈 How to Build an Keyword Wizard

2016 PIXNET HACKATHON

8/13