intelligent database systems lab n.y.u.s.t. i. m. evaluation of novelty metrics for sentence-level...
TRANSCRIPT
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
Evaluation of novelty metrics for
sentence-level novelty mining
Presenter : Lin, Shu-Han
Authors : Flora S. Tsai , Wenyin Tang, Kap Luk Chan
Information Sciences, InS (2010)
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
2
Outline
Introduction Motivation Objective Methodology Compare study Experiments Conclusion Comments
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Introduction
3
Define Novelty? Novelty is the opposite of “similarity ” or “redundancy”
Novelty: Given the set of relevant sentences in all documents, identify all novel
sentence.
How to identify Novelty sentences? A novelty score: Measured and Scored by a novelty metric
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Motivation
4
Sentence 1: U.S. Stocks set for big sell-off Sentence 2 (incoming sentence) : U.S. Stocks
*S2 is covered by S1
Novelty(S1, S2) = 1 – similarity(S1, S2)
There is low similarity between S1 and S2SOS2 is novelty ???
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Objectives
5
How to choose a novelty metric? How to set a suitable threshold automatically?
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Methodology - Novelty Metrics
6
Symmetric (1 – similarity) S1 is novelty to S2
S2 is novelty to S1
Asymmetric S1 is not novelty to S2
S2 is novelty to S1
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
Methodology- Symmetric metrics
7
Cosine similarity
Jaccard Similarity
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
Methodology- ASymmetric metrics
8
Overlap metric
New word count metric
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Compare study
9
Performance Requirements (trade-off) :high (recall / precision / F-score)
The distribution: (high / medium / low) novelty ratio
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Compare study – Performance Require
10
F-Score/precision F-Score/recall
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Methodology – A new Framework
Combine symmetic and asymmetric metrics Two problems:
The scaling problem: comparable and consistent of metrics
The combining strategy
13
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
Experiments – Mixed metrics vs. individual metrics
14
M3 (jacc+new)
tf.isf
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Experiments –
Mixed metric M3 vs. individual metrics for novelty ratio
15
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
Experiments – Mixed metric M3 vs. mixture of two symmetric metricsvs.mixture of two asymmetric metrics vs.mixture of all metricsfor novelty ratio
16
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
18
Conclusions
Comparative study Different types of novelty metrics
Symmetric: cosine / Jaccard Asymmetric: new word count / overlap
Observes Its strengths Introduce
Mixture of two types of novelty metrics
More stable than using individual metric