enlister baidu's recommender system for the biggest chinese q&a website

37
Baidu's Recommender System for the Biggest Chinese Q&A Website Enlister JasonFu | [email protected] 2012-12-10, Monday

Upload: jasonfuoo

Post on 05-Jul-2015

509 views

Category:

Education


0 download

TRANSCRIPT

Page 1: Enlister baidu's recommender system for the biggest chinese q&a website

Baidu's Recommender System for

the Biggest Chinese Q&A Website

Enlister

JasonFu | [email protected]

2012-12-10, Monday

Page 2: Enlister baidu's recommender system for the biggest chinese q&a website

We are leaving the age of information and entering the age of recommendation.

克里斯·安德森,《长尾理论》作者

1/36

Page 3: Enlister baidu's recommender system for the biggest chinese q&a website

ABOUT THE PAPER

2/36

Page 4: Enlister baidu's recommender system for the biggest chinese q&a website

BACKGROUND

3/36

Page 5: Enlister baidu's recommender system for the biggest chinese q&a website

BACKGROUND

4/36

Page 6: Enlister baidu's recommender system for the biggest chinese q&a website

BACKGROUND

5/36

Page 7: Enlister baidu's recommender system for the biggest chinese q&a website

BACKGROUND

Baidu Knows offers a Q&A platform to its users for knowledge

and experience sharing.

Today, Baidu Knows website:

There are over 400 million questions

More than 170 million questions were answered by users

Around 100 million users search for answers every day

Over 12 new questions are posted online every second

Baidu Knows is the biggest Q&A website in China.

6/36

Page 8: Enlister baidu's recommender system for the biggest chinese q&a website

BACKGROUND

An eco-system of knowledge sharing between the website’s users.

As more questions have been answered, the search result quality of

common users will be improved.

“Answer” is contribution, not consumption.

7/36

Page 9: Enlister baidu's recommender system for the biggest chinese q&a website

APPLICATION DESCRIPTION

Baidu Knows builds an intelligent RS, Enlister, to provide the Baidu

Knows users with questions that they may be willing to answer.

Is based on the machine learning technology

Apply a machine learning based CTR prediction methodology to

improve the recommendation accuracy

8/36

Page 10: Enlister baidu's recommender system for the biggest chinese q&a website

APPLICATION DESCRIPTION

Content-based

CTR prediction

Stream computing

Previous Enlister

Typical content-based

Cosine similarity degree

9/36

Page 11: Enlister baidu's recommender system for the biggest chinese q&a website

ALGORITHM

We expect the user models to disclose the nature of our users’

choices of answering questions.

The models have to be simple enough to accommodate massive

calculation and industrial adoption.

10/36

Page 12: Enlister baidu's recommender system for the biggest chinese q&a website

ALGORITHM

1. User Model

2. Click Prediction

3. Diversity Adjustment

11/36

Page 13: Enlister baidu's recommender system for the biggest chinese q&a website

1. User Model

user model

attributes

Age, Gender, Education

Tags

interest

Interest Term Vector

Related Questions

Abstract Interest Vector

12/36

Page 14: Enlister baidu's recommender system for the biggest chinese q&a website

1. User Model

Interest Term Vector

A vector contains weights of terms, which implicate the correlation between

user and the term on sematic level.

Related Questions

Questions are browsed or answered by a user.

Abstract Interest Vector

A vector contains weights of terms, which implicate the correlation between

a user and an abstract concept.

Use PLSA technology to get a conceptual topic model from millions of

question-answer pairs on the Baidu Knows website .

13/36

Page 15: Enlister baidu's recommender system for the biggest chinese q&a website

ALGORITHM

1. User Model

2. Click Prediction

3. Diversity Adjustment

14/36

Page 16: Enlister baidu's recommender system for the biggest chinese q&a website

2. Click Prediction

The click model that we created is a kind of probabilistic

classification model. It is a binary classification model to

calculate the probability of a sample belonging to a class.

15/36

Page 17: Enlister baidu's recommender system for the biggest chinese q&a website

2. Click Prediction

Sample Collection

Positive samples: questions that the user had examined and clicked.

Negative samples: questions that randomly choose from the question pool.

Feature Selection

User Attributes: (1) basic user attributes (2) other features from the statistics

Correlation Degree: the number of matched terms, cosine similarity and bm25

between the user interest term vectors and question vectors

Classification Algorithm: two principles (1) probabilistic classification (2) linear

classifier

16/36

Page 18: Enlister baidu's recommender system for the biggest chinese q&a website

2. Click Prediction

Classification Algorithm

maximum entropy classifier

The probability P of a user u will click the question q can be calculated as

follows:

Global optimization solution:limited-memory Broyden-Fletcher-

Goldfarb-Shanno (L-BFGS) ; Stochastic Gradient Descent (SGD)

17/36

Page 19: Enlister baidu's recommender system for the biggest chinese q&a website

ALGORITHM

1. User Model

2. Click Prediction

3. Diversity Adjustment

18/36

Page 20: Enlister baidu's recommender system for the biggest chinese q&a website

3. Diversity Adjustment

The head part and the tail part of a list garnered most attention from the

users.

For the head part, we apply a loose filtering algorithm, which only deletes

some apparent duplication in the list.

For the tail part, we use a strict filtering algorithm to take out any

questions that have noticeable semantic level similarity to each other in

the list.

19/36

Page 21: Enlister baidu's recommender system for the biggest chinese q&a website

SYSTEM SETUP

The most important concept in the Enlister system design is real-time CTR

prediction. The major data process can be described as follows :

20/36

Page 22: Enlister baidu's recommender system for the biggest chinese q&a website

SYSTEM SETUP

For building the data processing flow, we construct multiple logic queues between

processing nodes.

The processing nodes are grouped into several node groups. Each group

represents a simple logic section.

21/36

Page 23: Enlister baidu's recommender system for the biggest chinese q&a website

22/36

Page 24: Enlister baidu's recommender system for the biggest chinese q&a website

Before login After login

23/36

Page 25: Enlister baidu's recommender system for the biggest chinese q&a website

24/36

Page 26: Enlister baidu's recommender system for the biggest chinese q&a website

25/36

Page 27: Enlister baidu's recommender system for the biggest chinese q&a website

EXPERIMENT & EVALUATION

1. Evaluation Metrics

2. Experiment

3. Online Evaluation

26/36

Page 28: Enlister baidu's recommender system for the biggest chinese q&a website

1. Evaluation Metrics

Confusion Matrix

Precision(查准率):tp/(tp+fp),识别出的真正的正面观点数/所有的识别为正面观点的条数

Recall(查全率):tp/(tp+fn), 识别出的真正的正面观点数/样本中所有的真正正面观点的条数

Accuracy(准确率): (tp + tn)/(tp + fn + fp + tn), 正确识别观点数/所有观点的条数

27/36

Page 29: Enlister baidu's recommender system for the biggest chinese q&a website

EXPERIMENT & EVALUATION

1. Evaluation Metrics

2. Experiment

3. Online Evaluation

28/36

Page 30: Enlister baidu's recommender system for the biggest chinese q&a website

2. Experiment

Sample Selection

100,000 questions that had been viewed and clicked by users are selected from

users’ logs as positive sample, which involves 10 thousands users with 10

records per user on average.

Negative samples: random negative samples

29/36

Page 31: Enlister baidu's recommender system for the biggest chinese q&a website

Sample Proportion

2. Experiment

30/36

Page 32: Enlister baidu's recommender system for the biggest chinese q&a website

Optimization Algorithm

LBFGS algorithms is chosen as the optimization algorithm in the maximum

entropy model training.

2. Experiment

31/36

Page 33: Enlister baidu's recommender system for the biggest chinese q&a website

EXPERIMENT & EVALUATION

1. Evaluation Metrics

2. Experiment

3. Online Evaluation

32/36

Page 34: Enlister baidu's recommender system for the biggest chinese q&a website

3. Online Evaluation

Enlister was released to the Baidu Knows users and an online evaluation

was conducted from Feb. 11th, 2012.

33/36

Page 35: Enlister baidu's recommender system for the biggest chinese q&a website

3. Online Evaluation

34/36

Page 36: Enlister baidu's recommender system for the biggest chinese q&a website

CONCLUSION

① Have successfully built a real-time RS that serves millions of users every

day.

② The algorithm and system design fit the recommendation scenario quite

well.

③ Great improvement had been made on the accuracy and time-sensitive

issues.

④ The number of active users had grown substantially after the system was

officially launched.

The future work : the timing of recommendation and the utilization of

relationships between users

35/36

Page 37: Enlister baidu's recommender system for the biggest chinese q&a website

Thank You !

36/36