chart junks vis lieslcs.ios.ac.cn/~shil/wiki/images/3/3b/l8_text_and... · the true meaning of its...
TRANSCRIPT
Last Week: Visualization Design IIg
Chart Junks Vis Lies
1
Last Week: Visualization Design IIgSensory representation Arbitrary representation
• Understand without learning
• Sensory immediacy
• Hard to learn
• Easy to forget• Sensory immediacy
• Cross-cultural validity
• Easy to forget
• Embedded in culture and apps
汉字:一二三人 dog
Antidisestablishmentarianism
人山森
2Euler diagram: circle for boundary Language: need to learn
森
Last Week: Data Modeland Explorative Visual Analyticsand Explorative Visual Analytics
• 1-D (Linear Set and Sequences) SeeSoft Info Mural1 D (Linear, Set and Sequences) SeeSoft, Info Mural
• 2-D (Map) GIS, ArcView, PageMaker
• 3-D (Shape the World) CAD Medical Architecture3-D (Shape, the World) CAD, Medical, Architecture
• n-D (Relational) Spotfire, Tableau
•• Temporal LifeLines, Palantir
• Tree (Hierarchy) Cone/Cam/Hyperbolic
•• Network (Graph) Pajek, JUNG
3
Last Week: Data Modeland Explorative Visual Analyticsand Explorative Visual Analytics
4
Text Visualization I—— IV Course Spring’14
Graduate Coursef UCASof UCAS
Mar. 28th, 2014
5
InfoVisP l
VisualizationPipeline
• Text Visualization
DataUser
Data• Text Data Model
• Potential Users?• Tasks
6
• Tasks
Outline• Text visualization background
– Examples– User, tasks and text visualization pipeline
• Text visualization approachespp– Information Retrieval purpose– Overview and sense-making purpose
• Text analytics basics– Word/sentence-level– Corpus-level
7
Text is Everywhere y• We use documents as primary information artifact in our lives• Our access to documents has grown Our access to documents has grown tremendously in recent years due to the InternetInternetWWW
l l Digital libraries Web 2.0
...
8
Text Visualization Examplesp
9
Examplesp
10
Examplesp
11
Examplesp
12
Examplesp
13
Examplesp
14
Examplesp
15
Examplesp
16
Examplesp
17
Big Questionsg Q• What can information visualization provide to help users in understanding and gathering information from text and document f fcollections? (Task)• Who will be interested and benefit from • Who will be interested and benefit from text visualization? (User) ...
18
Tasks & Goals Whi h d t t i t t t i XYZ? • Which documents contain text on topic XYZ?
• Which documents are of interest to me? • Are there other documents that are similar to this one (so they m m ( yare worthwhile)? • How are different words used in a document or a document collection? collection? • What are the main themes and ideas in a document or a collection? • Which documents have an angry tone? • How are certain words or themes distributed through a document? document? • Identify “hidden” messages or stories in this document collection. • How does one set of documents differ from another set? • Quickly gain an understanding of a document or collection in order to subsequently do XYZ
19
order to subsequently do XYZ. • Understand the history of changes in a document. • Find connections between documents.
A th T k A k B tt Q ti Another Task: Ask Better Questions on Text Collections
20
Users of Text Visualization• Government Intelligence Analysts?
• Literature researcher? Artist? ...
• ???? [To be answered in Assignment II]
21
g ]
P t ti l U P tPotential User: Parentshttp://www.babynamewizard.com/voyager#prefix=&sw=both&exact=falsep y y g p
22
Text Visualization Pipelinep
23
Text Visualization for Information Retrieval
• Which documents contain text on topic XYZ? • Which documents are of interest to me? • Are there other documents that are similar to this one (so they are worthwhile)? ...
24
Text Visualization for Information Retrieval
25
TileBar• Search engine query results do not include: How strong the match is How frequent each term is How each term is distributed in the document Overlap between terms p Length of document
• Document ranking is opaque Document ranking is opaque • Inability to compare between results • Input limits term relationships
26
TileBarSearch Terms
Query ResultVisualizationVisualization
27
TileBar
28
More Text Visualization for IR• Visualize One query ...
query distance
ddocument
29
More Text Visualization for IR• Multiple queries ...
30
More Text Visualization for IR
31
Comparing Search ResultsColor represents different search
engines
32
Text Visualization for Sensemakingg• How are different words used in a document or a document collection? document collection? • What are the main themes and ideas in a document or a collection? collect on? • Which documents have an angry tone? • How are certain words or themes distributed through a document? • Identify “hidden” messages or stories in this document collection collection. • How does one set of documents differ from another set? • Quickly gain an understanding of a document or collection Q y g g f min order to subsequently do XYZ. • Understand the history of changes in a document. F d b d
33
• Find connections between documents. ...
Text Visualization Method Taxonomyy• Document-level visualization: document distribution & summarization• Text content-level visualization: overview & Text content level visualization: overview & navigation K d f Keyword frequency Associated facet: time, topic, sentiment, etc.
k d • Text entities in context: keyword occurrence• Text entity relationship and/or internal text structure
34
text structure ...
Document Visualization• InfoSky & SPIRE: 2D projection of document
b P / D /vectors by PCA/MDS/etc. ...
EInfoSky SPIRE
35
Document Visualization• Exemplar-based document visualization ...
Visualization of documents in 20 Newsgroups (18 864 documents 20 topics) by EV
36
Visualization of documents in 20 Newsgroups (18,864 documents, 20 topics) by EV. Each point represents a document; each color shape represents a news topic; and the corresponding big color shape indicates the mean of a news group.
Document Visualization• Document Card InfoVis ’08 Proceedings
37
Text Content Visualization: Keywordsy
Bubble Chart
38
Text Content Visualization: Keywordsy
Tag Cloud
39
Text Content Visualization: Keywordsy
OrderedT Cl dTag Cloud
40
Text Content Visualization: Keywordsy
Bi-gram
41
Text Content Visualization: KeywordsyWordle
42
Text Content Visualization: Keywordsy
Manipulating WordleManipulating Wordle
43
Text Content Visualization with Facets h l d • TIARA & ThemeRiver & Context-Preserving Tag Cloud &
Parallel TagCloudTemporal/topical/facet extension of TagCloud/Wordle– Temporal/topical/facet extension of TagCloud/Wordle
– Provide more interactions to drill-down to small document portions
TIARA ThemeRiver
44Context-Preserving TagCloud Parallel TagCloud
Text Content Visualization with Facets• Parallel Tag Cloud
45
Text Entities: Keyword in Context• TAKMI & FeatureLens & TileBar
– Visualizing entity/feature/concept
FeatureLens
g y pwithin the content
– Visualizing occurrence patterns within the content: temporal, topical, th c nt nt t mp ra , t p ca , correlational
– “Keyword + context” paradigm for detailsdetails
TileBarTileBarTAKMI
46
Visual Readability Analysis
47
Text Entity Relationship• Jigsaw & WordTree
– Visualizing entity
Jigsaw
relationships– Extract natural
relationships: co-relationships: cooccurrence, sequential
– Support navigation with f di tifocus redirection
Word Tree
48
Text Entity Relationship• PhraseNet & FacetAtlas
– Visualizing entity relationships with advanced analytics: WordNet
DocuBurst
advanced analytics: WordNet, intermediate word, multi-faceted relationshipsSt t f “ h it ” l ti hi – Start from a “search item”: relationship item or concept item
– Only visualization, few navigationPhraseNet FacetAtlas
49
Text Analytics Basics: Text Mining
• Text pre-processing (parsing) Remove stop words Keyword stemming
• Text feature extraction Keyword frequency Keyword frequency Topic modeling
T xt f tu m su m nt• Text feature measurement Similarity Text clustering
50
Text Parsing
"I have a dream that one day this nation will rise up and live out y p
the true meaning of its creed: "We hold these truths to be self-
id t th t ll t d l "
St p d m l: th th t t
evident, that all men are created equal."
Stop word removal: a, the, that, etc.
Keyword stemming: men->man, truths->truthy g
Parsing result: I, dream, one, day, nation, rise, up, live, out,
i d h ld h b lf id lltrue, meaning, creed, hold, truth, be, self-evident, all, man,
created, equal
51
Basic Text Modeling
• Bag-of-words model: vector representationg pWord I dream color skin nation slave injustice owner
Frequency 4 4 1 1 2 2 1 1
• Text similarity:Cosine similarity between Frequency 4 4 1 1 2 2 1 1
two words
• TF-IDF weighting: term frequency * inverse F DF w ght ng t rm fr qu ncy n rs document frequency
52
Topic Modeling
• Popular methods:– Latent Semantic Indexing– pLSI, LDA
53
p
Summaryy• Background
– Examples– User, tasks and text visualization pipelinep p
• Text visualization methods– IR purposeIR purpose– Overview and sense-making: 5 categories
T t l ti b i• Text analytics basics– Text parsing, measurement and topic modeling
54
Questions?Questions?
What’s Next -- Lecture 8: Text Visualization II
55