清华大学计算机系 answer generating methods for community question and answering portals {tao...

23
清清清清清清清清 Answer Generating Methods for Answer Generating Methods for Community Question and Answering Community Question and Answering Portals Portals {Tao Haoxiong, Hao Yu, Zhu Xiaoyan} @Tsinghua University

Upload: hugo-riley

Post on 19-Jan-2016

270 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 清华大学计算机系 Answer Generating Methods for Community Question and Answering Portals {Tao Haoxiong, Hao Yu, Zhu Xiaoyan} @Tsinghua University

清华大学计算机系

Answer Generating Methods for Answer Generating Methods for Community Question and Answering PortalsCommunity Question and Answering Portals

{Tao Haoxiong, Hao Yu, Zhu Xiaoyan}

@Tsinghua University

Page 2: 清华大学计算机系 Answer Generating Methods for Community Question and Answering Portals {Tao Haoxiong, Hao Yu, Zhu Xiaoyan} @Tsinghua University

清华大学计算机系

OutlineOutline

• Introduction and Related Work

• List-type Question– Answer generating method

– Method result and analysis

• Solution-type Question– Visible list

– Select the best list

– Experiment and analysis

• Conclusion

• Future Work

Page 3: 清华大学计算机系 Answer Generating Methods for Community Question and Answering Portals {Tao Haoxiong, Hao Yu, Zhu Xiaoyan} @Tsinghua University

清华大学计算机系

IntroductionIntroduction

• Online community question answering (cQA) portals

have become a popular way to acquire information,

like Soso Wenwen and Baidu Zhidao.

• But they have some limitations:

– Can’t get answers in real-time.

– The quality of many answers is not high.

Page 4: 清华大学计算机系 Answer Generating Methods for Community Question and Answering Portals {Tao Haoxiong, Hao Yu, Zhu Xiaoyan} @Tsinghua University

清华大学计算机系

Related WorkRelated Work

• To overcome unreal-time limitation, cQA portals

support search service.

– Users need to click links to see the whole answers.

– Spend long time to find useful information.

Page 5: 清华大学计算机系 Answer Generating Methods for Community Question and Answering Portals {Tao Haoxiong, Hao Yu, Zhu Xiaoyan} @Tsinghua University

清华大学计算机系

Related WorkRelated Work

• To return high-quality answers

– Predict the quality of cQA answers.

• User profile features, text features, etc.

– Use multi-document summarization to summarize answers.

• More comprehensive but less readable.

– To improve answer quality, almost all well-perform systems

introduce a question taxonomy.

Page 6: 清华大学计算机系 Answer Generating Methods for Community Question and Answering Portals {Tao Haoxiong, Hao Yu, Zhu Xiaoyan} @Tsinghua University

清华大学计算机系

Related WorkRelated Work

• The question taxonomy proposed by Fan Bu contains

6 question types:TYPE proportion

List 23.8%

Solution 19.7%

Reason 18.1%

Navigation 14.8%

Fact 14.4%

Definition 7.5%

• Examples:– List-type: List Nobel prize winners in 1990s?– Solution-type: How to make pizzas?

Page 7: 清华大学计算机系 Answer Generating Methods for Community Question and Answering Portals {Tao Haoxiong, Hao Yu, Zhu Xiaoyan} @Tsinghua University

清华大学计算机系

Research FrameworkResearch Framework

• Propose answer generating methods for both List-

type and Solution-type questions.

Page 8: 清华大学计算机系 Answer Generating Methods for Community Question and Answering Portals {Tao Haoxiong, Hao Yu, Zhu Xiaoyan} @Tsinghua University

清华大学计算机系

List-type QuestionList-type Question

• Each answer will be a single phrase or a list of phrases.

Page 9: 清华大学计算机系 Answer Generating Methods for Community Question and Answering Portals {Tao Haoxiong, Hao Yu, Zhu Xiaoyan} @Tsinghua University

清华大学计算机系

Answer Generating MethodAnswer Generating Method

• Two characteristics about answers:

– “Best Answer” often don’t contain all answer points.

– Answer points which are high-quality or relevant to the

question often appear in more than one answers.

• Propose a method based on clustering of answer points.

Page 10: 清华大学计算机系 Answer Generating Methods for Community Question and Answering Portals {Tao Haoxiong, Hao Yu, Zhu Xiaoyan} @Tsinghua University

清华大学计算机系

Answer Generating MethodAnswer Generating Method

Page 11: 清华大学计算机系 Answer Generating Methods for Community Question and Answering Portals {Tao Haoxiong, Hao Yu, Zhu Xiaoyan} @Tsinghua University

清华大学计算机系

Answer Generating MethodAnswer Generating Method

Page 12: 清华大学计算机系 Answer Generating Methods for Community Question and Answering Portals {Tao Haoxiong, Hao Yu, Zhu Xiaoyan} @Tsinghua University

清华大学计算机系

Example of the Method ResultExample of the Method Result

Page 13: 清华大学计算机系 Answer Generating Methods for Community Question and Answering Portals {Tao Haoxiong, Hao Yu, Zhu Xiaoyan} @Tsinghua University

清华大学计算机系

Method Result and AnalysisMethod Result and Analysis

• Result contains more answer points than “Best

Answer”.

• Outputs are ranked. Easy to control the answer length.

• Further research is needed:

– Split answer into answer points.

– Choose the threshold of clustering.

Page 14: 清华大学计算机系 Answer Generating Methods for Community Question and Answering Portals {Tao Haoxiong, Hao Yu, Zhu Xiaoyan} @Tsinghua University

清华大学计算机系

Solution-type QuestionSolution-type Question

• Visible List

Page 15: 清华大学计算机系 Answer Generating Methods for Community Question and Answering Portals {Tao Haoxiong, Hao Yu, Zhu Xiaoyan} @Tsinghua University

清华大学计算机系

Solution-type QuestionSolution-type Question

• Visible List

– Choose 1179 solved Solution-type questions from Baidu

Zhidao, 30% questions’ answers having visible lists.

– Average length of “Best Answer” is above 1400 words,

while average length of visible list is about 600 words.

– 55% questions have more than one visible lists. We

propose a method to select the best list.

Page 16: 清华大学计算机系 Answer Generating Methods for Community Question and Answering Portals {Tao Haoxiong, Hao Yu, Zhu Xiaoyan} @Tsinghua University

清华大学计算机系

Select the Best ListSelect the Best List

• Features:– FirstList

• If the list is the first list of the answer, then this feature value is 1, otherwise its value is 0.

– GuideSimilarity

• Cosine similarity between Guide words and question title.– Guide words: 列表四:三种方法巧疗慢性咽炎

– Question title: 问题:慢性咽炎怎么治疗?

– ContentSimilarity

• Cosine similarity between list content and question.

Page 17: 清华大学计算机系 Answer Generating Methods for Community Question and Answering Portals {Tao Haoxiong, Hao Yu, Zhu Xiaoyan} @Tsinghua University

清华大学计算机系

Select the Best ListSelect the Best List

• Features:– VPRatio

• Word ratio of verbs and prepositions in the content of the list.

– SummaryScore

• Summarized answer contains N sentences, for every visible list, if it contains k sentences out of the N sentences, then it will have a summary score of k/N.

• Method:– Each feature is a [0, 1] value, we use Learning to Rank

model to get the weight of every feature.

Page 18: 清华大学计算机系 Answer Generating Methods for Community Question and Answering Portals {Tao Haoxiong, Hao Yu, Zhu Xiaoyan} @Tsinghua University

清华大学计算机系

Experiment and AnalysisExperiment and Analysis

• Dataset:– Choose 1179 questions from Baidu Zhidao, 358 (30%) questions

have visible lists.

– 196 (55%) questions have more than one lists.

– Manually label a score to the 196 questions with more than one visible list:

• 1: high quality; 0:low quality.

• Two evaluations:– Evaluate the method of selecting the best list.

– Evaluate the quality of visible list as the answer

Page 19: 清华大学计算机系 Answer Generating Methods for Community Question and Answering Portals {Tao Haoxiong, Hao Yu, Zhu Xiaoyan} @Tsinghua University

清华大学计算机系

Result of Selected Visible-lists Result of Selected Visible-lists

*Random select: 51.7%

Page 20: 清华大学计算机系 Answer Generating Methods for Community Question and Answering Portals {Tao Haoxiong, Hao Yu, Zhu Xiaoyan} @Tsinghua University

清华大学计算机系

Evaluate Visible List as AnswerEvaluate Visible List as Answer

• Manually compare the quality of “Best Answer” and

visible list for each question:

– Mainly focus on the relevance to question, completeness

and whether containing redundant information.

• The average length of visible list is 600 words, while the average length is more than 1400 words for “Best Answer”.

Page 21: 清华大学计算机系 Answer Generating Methods for Community Question and Answering Portals {Tao Haoxiong, Hao Yu, Zhu Xiaoyan} @Tsinghua University

清华大学计算机系

ConclusionConclusion

• Relying on the similar questions and their answers

from the cQA portals, propose appropriate answer

generating methods for List-type and Solution-type

questions

– List-type questions: based on the clustering of answer points.

– Solution-type questions: based on visible lists.

Page 22: 清华大学计算机系 Answer Generating Methods for Community Question and Answering Portals {Tao Haoxiong, Hao Yu, Zhu Xiaoyan} @Tsinghua University

清华大学计算机系

Future WorkFuture Work

• List-type questions:– Do further research to split the answer into answer points

more robustly.

• Solution-type questions:– Introduce more semantic features to improve the semantic

relevance between selected list and question.

• Other types of questions:– Do further research to generate high-quality answers.

Page 23: 清华大学计算机系 Answer Generating Methods for Community Question and Answering Portals {Tao Haoxiong, Hao Yu, Zhu Xiaoyan} @Tsinghua University

清华大学计算机系

ThanksThanks