清华大学计算机系 answer generating methods for community question and answering portals {tao...

清华大学计算机系

Answer Generating Methods for Answer Generating Methods for Community Question and Answering PortalsCommunity Question and Answering Portals

{Tao Haoxiong, Hao Yu, Zhu Xiaoyan}

@Tsinghua University


OutlineOutline

• Introduction and Related Work

• List-type Question– Answer generating method

– Method result and analysis

• Solution-type Question– Visible list

– Select the best list

– Experiment and analysis

• Conclusion

• Future Work


IntroductionIntroduction

• Online community question answering (cQA) portals

have become a popular way to acquire information,

like Soso Wenwen and Baidu Zhidao.

• But they have some limitations:

– Can’t get answers in real-time.

– The quality of many answers is not high.


Related WorkRelated Work

• To overcome unreal-time limitation, cQA portals

support search service.

– Users need to click links to see the whole answers.

– Spend long time to find useful information.



• To return high-quality answers

– Predict the quality of cQA answers.

• User profile features, text features, etc.

– Use multi-document summarization to summarize answers.

• More comprehensive but less readable.

– To improve answer quality, almost all well-perform systems

introduce a question taxonomy.



• The question taxonomy proposed by Fan Bu contains

6 question types:TYPE proportion

List 23.8%

Solution 19.7%

Reason 18.1%

Navigation 14.8%

Fact 14.4%

Definition 7.5%

• Examples:– List-type: List Nobel prize winners in 1990s?– Solution-type: How to make pizzas?


Research FrameworkResearch Framework

• Propose answer generating methods for both List-

type and Solution-type questions.


List-type QuestionList-type Question

• Each answer will be a single phrase or a list of phrases.


Answer Generating MethodAnswer Generating Method

• Two characteristics about answers:

– “Best Answer” often don’t contain all answer points.

– Answer points which are high-quality or relevant to the

question often appear in more than one answers.

• Propose a method based on clustering of answer points.


Answer Generating MethodAnswer Generating Method


Example of the Method ResultExample of the Method Result


Method Result and AnalysisMethod Result and Analysis

• Result contains more answer points than “Best

Answer”.

• Outputs are ranked. Easy to control the answer length.

• Further research is needed:

– Split answer into answer points.

– Choose the threshold of clustering.


Solution-type QuestionSolution-type Question

• Visible List


Solution-type QuestionSolution-type Question

• Visible List

– Choose 1179 solved Solution-type questions from Baidu

Zhidao, 30% questions’ answers having visible lists.

– Average length of “Best Answer” is above 1400 words,

while average length of visible list is about 600 words.

– 55% questions have more than one visible lists. We

propose a method to select the best list.


Select the Best ListSelect the Best List

• Features:– FirstList

• If the list is the first list of the answer, then this feature value is 1, otherwise its value is 0.

– GuideSimilarity

• Cosine similarity between Guide words and question title.– Guide words: 列表四：三种方法巧疗慢性咽炎

– Question title: 问题：慢性咽炎怎么治疗？

– ContentSimilarity

• Cosine similarity between list content and question.


Select the Best ListSelect the Best List

• Features:– VPRatio

• Word ratio of verbs and prepositions in the content of the list.

– SummaryScore

• Summarized answer contains N sentences, for every visible list, if it contains k sentences out of the N sentences, then it will have a summary score of k/N.

• Method:– Each feature is a [0, 1] value, we use Learning to Rank

model to get the weight of every feature.


Experiment and AnalysisExperiment and Analysis

• Dataset:– Choose 1179 questions from Baidu Zhidao, 358 (30%) questions

have visible lists.

– 196 (55%) questions have more than one lists.

– Manually label a score to the 196 questions with more than one visible list:

• 1: high quality; 0:low quality.

• Two evaluations:– Evaluate the method of selecting the best list.

– Evaluate the quality of visible list as the answer


Result of Selected Visible-lists Result of Selected Visible-lists

*Random select: 51.7%


Evaluate Visible List as AnswerEvaluate Visible List as Answer

• Manually compare the quality of “Best Answer” and

visible list for each question:

– Mainly focus on the relevance to question, completeness

and whether containing redundant information.

• The average length of visible list is 600 words, while the average length is more than 1400 words for “Best Answer”.


ConclusionConclusion

• Relying on the similar questions and their answers

from the cQA portals, propose appropriate answer

generating methods for List-type and Solution-type

questions

– List-type questions: based on the clustering of answer points.

– Solution-type questions: based on visible lists.


Future WorkFuture Work

• List-type questions:– Do further research to split the answer into answer points

more robustly.

• Solution-type questions:– Introduce more semantic features to improve the semantic

relevance between selected list and question.

• Other types of questions:– Do further research to generate high-quality answers.


ThanksThanks

清华大学计算机系 answer generating methods for community question and answering portals {tao...

Documents