intelligent database systems lab n.y.u.s.t. i. m. evaluation of novelty metrics for sentence-level...

20
Intelligent Database Systems Lab N.Y.U.S. T. I. M. Evaluation of novelty metrics for sentence-level novelty mining Presenter : Lin, Shu-Han Authors : Flora S. Tsai , Wenyin Tang, Kap Luk Chan Information Sciences, InS (2010)

Upload: morgan-kelley-johnson

Post on 30-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

Evaluation of novelty metrics for

sentence-level novelty mining

Presenter : Lin, Shu-Han

Authors : Flora S. Tsai , Wenyin Tang, Kap Luk Chan

Information Sciences, InS (2010)

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

2

Outline

Introduction Motivation Objective Methodology Compare study Experiments Conclusion Comments

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Introduction

3

Define Novelty? Novelty is the opposite of “similarity ” or “redundancy”

Novelty: Given the set of relevant sentences in all documents, identify all novel

sentence.

How to identify Novelty sentences? A novelty score: Measured and Scored by a novelty metric

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Motivation

4

Sentence 1: U.S. Stocks set for big sell-off Sentence 2 (incoming sentence) : U.S. Stocks

*S2 is covered by S1

Novelty(S1, S2) = 1 – similarity(S1, S2)

There is low similarity between S1 and S2SOS2 is novelty ???

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Objectives

5

How to choose a novelty metric? How to set a suitable threshold automatically?

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Methodology - Novelty Metrics

6

Symmetric (1 – similarity) S1 is novelty to S2

S2 is novelty to S1

Asymmetric S1 is not novelty to S2

S2 is novelty to S1

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

Methodology- Symmetric metrics

7

Cosine similarity

Jaccard Similarity

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

Methodology- ASymmetric metrics

8

Overlap metric

New word count metric

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Compare study

9

Performance Requirements (trade-off) :high (recall / precision / F-score)

The distribution: (high / medium / low) novelty ratio

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Compare study – Performance Require

10

F-Score/precision F-Score/recall

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Compare study – Prior probability

11

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

Compare study – Prior probability

12

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Methodology – A new Framework

Combine symmetic and asymmetric metrics Two problems:

The scaling problem: comparable and consistent of metrics

The combining strategy

13

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

Experiments – Mixed metrics vs. individual metrics

14

M3 (jacc+new)

tf.isf

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Experiments –

Mixed metric M3 vs. individual metrics for novelty ratio

15

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

Experiments – Mixed metric M3 vs. mixture of two symmetric metricsvs.mixture of two asymmetric metrics vs.mixture of all metricsfor novelty ratio

16

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

Experiments – Weight

17

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

18

Conclusions

Comparative study Different types of novelty metrics

Symmetric: cosine / Jaccard Asymmetric: new word count / overlap

Observes Its strengths Introduce

Mixture of two types of novelty metrics

More stable than using individual metric

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

19

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

20

Comments

Advantage A Comparative study

Mixture

Intuitive

Drawback …

Application Novelty mining