chart junks vis lieslcs.ios.ac.cn/~shil/wiki/images/3/3b/l8_text_and... · the true meaning of its...

Post on 23-Jul-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Last Week: Visualization Design IIg

Chart Junks Vis Lies

1

Last Week: Visualization Design IIgSensory representation Arbitrary representation

• Understand without learning

• Sensory immediacy

• Hard to learn

• Easy to forget• Sensory immediacy

• Cross-cultural validity

• Easy to forget

• Embedded in culture and apps

汉字:一二三人 dog

Antidisestablishmentarianism

人山森

2Euler diagram: circle for boundary Language: need to learn

Last Week: Data Modeland Explorative Visual Analyticsand Explorative Visual Analytics

• 1-D (Linear Set and Sequences) SeeSoft Info Mural1 D (Linear, Set and Sequences) SeeSoft, Info Mural

• 2-D (Map) GIS, ArcView, PageMaker

• 3-D (Shape the World) CAD Medical Architecture3-D (Shape, the World) CAD, Medical, Architecture

• n-D (Relational) Spotfire, Tableau

•• Temporal LifeLines, Palantir

• Tree (Hierarchy) Cone/Cam/Hyperbolic

•• Network (Graph) Pajek, JUNG

3

Last Week: Data Modeland Explorative Visual Analyticsand Explorative Visual Analytics

4

Text Visualization I—— IV Course Spring’14

Graduate Coursef UCASof UCAS

Mar. 28th, 2014

5

InfoVisP l

VisualizationPipeline

• Text Visualization

DataUser

Data• Text Data Model

• Potential Users?• Tasks

6

• Tasks

Outline• Text visualization background

– Examples– User, tasks and text visualization pipeline

• Text visualization approachespp– Information Retrieval purpose– Overview and sense-making purpose

• Text analytics basics– Word/sentence-level– Corpus-level

7

Text is Everywhere y• We use documents as primary information artifact in our lives• Our access to documents has grown Our access to documents has grown tremendously in recent years due to the InternetInternetWWW

l l Digital libraries Web 2.0

 ...

8

Text Visualization Examplesp

9

Examplesp

10

Examplesp

11

Examplesp

12

Examplesp

13

Examplesp

14

Examplesp

15

Examplesp

16

Examplesp

17

Big Questionsg Q• What can information visualization provide to help users in understanding and gathering information from text and document f fcollections? (Task)• Who will be interested and benefit from • Who will be interested and benefit from text visualization? (User) ...

18

Tasks & Goals Whi h d t t i t t t i XYZ? • Which documents contain text on topic XYZ?

• Which documents are of interest to me? • Are there other documents that are similar to this one (so they m m ( yare worthwhile)? • How are different words used in a document or a document collection? collection? • What are the main themes and ideas in a document or a collection? • Which documents have an angry tone? • How are certain words or themes distributed through a document? document? • Identify “hidden” messages or stories in this document collection. • How does one set of documents differ from another set? • Quickly gain an understanding of a document or collection in order to subsequently do XYZ

19

order to subsequently do XYZ. • Understand the history of changes in a document. • Find connections between documents.

A th T k A k B tt Q ti Another Task: Ask Better Questions on Text Collections

20

Users of Text Visualization• Government Intelligence Analysts?

• Literature researcher? Artist? ...

• ???? [To be answered in Assignment II]

21

g ]

P t ti l U P tPotential User: Parentshttp://www.babynamewizard.com/voyager#prefix=&sw=both&exact=falsep y y g p

22

Text Visualization Pipelinep

23

Text Visualization for Information Retrieval

• Which documents contain text on topic XYZ? • Which documents are of interest to me? • Are there other documents that are similar to this one (so they are worthwhile)?  ...

24

Text Visualization for Information Retrieval

25

TileBar• Search engine query results do not include: How strong the match is How frequent each term is How each term is distributed in the document Overlap between terms p Length of document

• Document ranking is opaque Document ranking is opaque • Inability to compare between results • Input limits term relationships

26

TileBarSearch Terms

Query ResultVisualizationVisualization

27

TileBar

28

More Text Visualization for IR• Visualize One query ...

query distance

ddocument

29

More Text Visualization for IR• Multiple queries ...

30

More Text Visualization for IR

31

Comparing Search ResultsColor represents different search

engines

32

Text Visualization for Sensemakingg• How are different words used in a document or a document collection? document collection? • What are the main themes and ideas in a document or a collection? collect on? • Which documents have an angry tone? • How are certain words or themes distributed through a document? • Identify “hidden” messages or stories in this document collection collection. • How does one set of documents differ from another set? • Quickly gain an understanding of a document or collection Q y g g f min order to subsequently do XYZ. • Understand the history of changes in a document. F d b d

33

• Find connections between documents.  ...

Text Visualization Method Taxonomyy• Document-level visualization: document distribution & summarization• Text content-level visualization: overview & Text content level visualization: overview & navigation K d f Keyword frequency Associated facet: time, topic, sentiment, etc.

k d • Text entities in context: keyword occurrence• Text entity relationship and/or internal text structure

34

text structure ...

Document Visualization• InfoSky & SPIRE: 2D projection of document

b P / D /vectors by PCA/MDS/etc. ...

EInfoSky SPIRE

35

Document Visualization• Exemplar-based document visualization ...

Visualization of documents in 20 Newsgroups (18 864 documents 20 topics) by EV

36

Visualization of documents in 20 Newsgroups (18,864 documents, 20 topics) by EV. Each point represents a document; each color shape represents a news topic; and the corresponding big color shape indicates the mean of a news group.

Document Visualization• Document Card InfoVis ’08 Proceedings

37

Text Content Visualization: Keywordsy

Bubble Chart

38

Text Content Visualization: Keywordsy

Tag Cloud

39

Text Content Visualization: Keywordsy

OrderedT Cl dTag Cloud

40

Text Content Visualization: Keywordsy

Bi-gram

41

Text Content Visualization: KeywordsyWordle

42

Text Content Visualization: Keywordsy

Manipulating WordleManipulating Wordle

43

Text Content Visualization with Facets h l d • TIARA & ThemeRiver & Context-Preserving Tag Cloud &

Parallel TagCloudTemporal/topical/facet extension of TagCloud/Wordle– Temporal/topical/facet extension of TagCloud/Wordle

– Provide more interactions to drill-down to small document portions

TIARA ThemeRiver

44Context-Preserving TagCloud Parallel TagCloud

Text Content Visualization with Facets• Parallel Tag Cloud

45

Text Entities: Keyword in Context• TAKMI & FeatureLens & TileBar

– Visualizing entity/feature/concept

FeatureLens

g y pwithin the content

– Visualizing occurrence patterns within the content: temporal, topical, th c nt nt t mp ra , t p ca , correlational

– “Keyword + context” paradigm for detailsdetails

TileBarTileBarTAKMI

46

Visual Readability Analysis

47

Text Entity Relationship• Jigsaw & WordTree

– Visualizing entity

Jigsaw

relationships– Extract natural

relationships: co-relationships: cooccurrence, sequential

– Support navigation with f di tifocus redirection

Word Tree

48

Text Entity Relationship• PhraseNet & FacetAtlas

– Visualizing entity relationships with advanced analytics: WordNet

DocuBurst

advanced analytics: WordNet, intermediate word, multi-faceted relationshipsSt t f “ h it ” l ti hi – Start from a “search item”: relationship item or concept item

– Only visualization, few navigationPhraseNet FacetAtlas

49

Text Analytics Basics: Text Mining

• Text pre-processing (parsing) Remove stop words Keyword stemming

• Text feature extraction Keyword frequency Keyword frequency  Topic modeling

T xt f tu m su m nt• Text feature measurement Similarity Text clustering

50

Text Parsing

"I have a dream that one day this nation will rise up and live out y p

the true meaning of its creed: "We hold these truths to be self-

id t th t ll t d l "

St p d m l: th th t t

evident, that all men are created equal."

Stop word removal: a, the, that, etc.

Keyword stemming: men->man, truths->truthy g

Parsing result: I, dream, one, day, nation, rise, up, live, out,

i d h ld h b lf id lltrue, meaning, creed, hold, truth, be, self-evident, all, man,

created, equal

51

Basic Text Modeling

• Bag-of-words model: vector representationg pWord I dream color skin nation slave injustice owner

Frequency 4 4 1 1 2 2 1 1

• Text similarity:Cosine similarity between Frequency 4 4 1 1 2 2 1 1

two words

• TF-IDF weighting: term frequency * inverse F DF w ght ng t rm fr qu ncy n rs document frequency

52

Topic Modeling

• Popular methods:– Latent Semantic Indexing– pLSI, LDA

53

p

Summaryy• Background

– Examples– User, tasks and text visualization pipelinep p

• Text visualization methods– IR purposeIR purpose– Overview and sense-making: 5 categories

T t l ti b i• Text analytics basics– Text parsing, measurement and topic modeling

54

Questions?Questions?

What’s Next -- Lecture 8: Text Visualization II

55

top related