ร้อยเรื่องราวจากข้อมูล / storytelling with data
TRANSCRIPT
Krist Wongsuphasawat / @kristw
ร้อยเรื่องราวจากข้อมูล STORYTELLING WITH DATA
แนะนําตัวก่อน
Computer Engineer Chulalongkorn University
PhD in Computer Science Information Visualization Univ. of Maryland
IBMMicrosoft
Data Scientist Twitter
Krist Wongsuphasawat / @kristw
ข้อมูล
ประมง
400
เก็บข้อมูล
Time Location Type
12:00 Paragon Magikarp
12:05 Siam Dis Magikarp
12:40 CTW Magikarp
… … …
เวลา
00:00 12:00 00:006:00 18:00
จำนวนปลา
เวลา
DATA VISUALIZATION การแปลงข้อมูลเป็นภาพ
ประวัติศาสตร์
data
Number of Napoleon's troops, Distance, Temperature, Latitude and Longitude,
Direction of travel, Location (relative to specific dates)
2 dimensions 6 types of data
DATA VISUALIZATION
Explanatory Communicate known information
Exploratory Explore data to reveal insights
ข้อมูลมาจากไหน?
DATA SOURCESOpen data Publicly available
Private data owned by organization, not available to public
Self-collected data Manual, site scraping, etc.
Combination of the above
OPEN DATA
OPEN DATA
เก็บเองก็ได้
ข้อมูลที่ทวิตเตอร์Tweets
Text, Time, Location, Media
User information Age, Country, etc.
Follows
User interactions Navigation, Views
MANY FORMS OF DATAStandalone files txt, csv, tsv, json, excel, Google Docs, …, pdf*
APIs better quality with more overhead
Databases doesn’t necessary mean they are organized
Big data bigger pain
HAVING ALL TWEETSHow people think I feel.
How people think I feel. How I really feel.
HAVING ALL TWEETS
CHALLENGESGet relevant Tweets hashtag: #oscars keywords: “goal” (football)
Too big Need to aggregate & reduce size
Slow Long processing time (hours)
Hadoop Cluster
GETTING BIG DATA
Data Storage
Pig / Scalding (slow)
GETTING BIG DATAHadoop Cluster
Data Storage
Tool
Hadoop Cluster
Pig / Scalding (slow)
GETTING BIG DATA
Data Storage
Tool
Pig / Scalding (slow)
GETTING BIG DATAHadoop Cluster
Data Storage
Tool
Your laptop Smaller dataset
Hadoop Cluster
Pig / Scalding (slow)
Data Storage
Tool
Final dataset
Tool node.js / python / excel (fast)
Your laptop
GETTING BIG DATA
Smaller dataset
เอาข้อมูลไปทําอะไร?
APPLICATIONS OF DATAPersonal analytics
Anyone
Product analytics Product Manager, Engineer
Data Journalism News, Magazine, Company’s Public Relations
…
NEW YORK TIMES GRAPHICS
http://www.nytimes.com/interactive/2014/08/13/upshot/where-people-in-each-state-were-born.html?abt=0002&abg=0#New_York
THE GUARDIAN
NEWSNew York Times
The Guardian
Washington Post
Wall Street Journal
FiveThirtyEight
etc.
GOOGLE TRENDS
https://www.google.com/trends/story/US_cu_XRyhKlcBAACrtM_en
GOOGLE TRENDS
https://www.google.com/trends/story/US_cu_XRyhKlcBAACrtM_en
UBER
https://newsroom.uber.com/a-day-in-the-life-of-uber/
ตัวอย่างงาน
ทวีตอะไร?
โปเกมอนที่ถูกพูดถึงมากที่สุด
ทวีตเมื่อไหร่?
ทวีตต่อนาที
ทวีตที่ไหน?
LOCATIONLow density
High density
by Miguel Rios
LOCATIONLow density
High density
by Miguel Rios
LOCATION
flickr.com/photos/twitteroffice/8798020541
San Francisco
Low density
High density
by Miguel Rios
Rebuild the world based on
tweet density
twitter.github.io/interactive/andes/
by Nicolas Garcia Belmonte
ทวีตอะไร? ที่ไหน? เมื่อไหร่?
HAPPY NEW YEAR สวัสดีปีใหม่
ปีใหม่ 2013
twitter.github.io/interactive/newyear2014/
USER อยู่ที่ไหน?
USER + LOCATION : FAN MAP
interactive.twitter.com/nfl_followers2014
USER + LOCATION : FAN MAP
interactive.twitter.com/nba_followers
USER + LOCATION : FAN MAP
interactive.twitter.com/premierleague
interactive.twitter.com
มีขั้นตอนอะไรบ้าง?
ขั้นตอนวิเคราะห์ข้อมูลCollect
Clean
Explore*
Analyze
Present*
ขั้นตอนวิเคราะห์ข้อมูลCollect
Clean
Explore*
Analyze
Present*
CASE STUDY: GAME OF THRONES
Problem is coming.CHAPTER I
“Problem first, not solution backward”
— Brian Caffo (via Ron Brookmeyer)
“If all you have is a hammer, everything looks like a nail.”
— Abraham Maslow
Problem
Want to know what the audience talk about a TV show
Problem
Want to know what the audience talk about a TV show
from Tweets
HBO’s Game of Thrones
Based on a book series “A Song of Ice and Fire” Medieval Fantasy. Knights, magic and dragons.
Brief Story
A King dies.
A lot of contenders wage a war to reclaim the throne.
Minor characters with no claim to the throne set their own plans in action to gain power
when all the major characters end up killing each other.
Brave/Honest/Honorable characters die.
Intelligent but shady characters and characters who know nothing
continue to live.
While humans are busy killing each other, ice zombies “White walkers” are invading from the North.
The only group who seems to care about this is neutral group called the Night’s Watch.
HBO’s Game of Thrones
Based on a book series “A Song of Ice and Fire” Medieval Fantasy. Knights, magic and dragons.
Many characters. Anybody can die.
6 seasons (57 episodes) so far
Multiple storylines in each episode
Problem
Want to know what the audience talk about a TV show
from Tweets
Ideas
Common words Too much noise
Ideas
Common words Too much noise
Characters How o"en each character were mentioned?
I demand a trial by prototyping.CHAPTER II
Prototyping
Pull sample data from Twitter API
Character recognition and counting naive approach
Sample Tweet
Sample Tweet
List of namesDaenerys Targaryen,Khaleesi
Jon Snow
Sansa Stark
Tyrion Lannister
Arya Stark
Cersei Lannister
Khal Drogo
Gregor Clegane,Mountain
Margaery Tyrell
Joffrey Baratheon
Bran Stark
Theon Greyjoy
Jaime Lannister
Brienne
Eddard Stark,Ned Stark
Ramsay Bolton
Sandor Clegane,Hound
Ygritte
Stannis Baratheon
Petyr Baelish,Little Finger
Robb Stark
Bronn
Varys
Catelyn Stark
Oberyn Martell
Daario Naharis
Davos Seaworth
Jorah Mormont
Melisandre
Myrcella Baratheon
Tywin Lannister
Tommen Baratheon
Grey Worm
Tyene Sand
Rickon Stark
Missandei
Roose Bolton
Robert Baratheon
Jojen Reed
Jeor Mormont
Tormund Giantsbane
Lysa Arryn
Yara Greyjoy,Asha Greyjoy
Samwell Tarly,Sam
Hodor
Victarion Greyjoy
High Sparrow
Dragon
Winter
Dothraki
Sample data
Character CountHodor 10000
Jon Snow 5000
Daenerys 4000
Bran Stark 3000
… …
*These numbers are made up for presentation, not real data.
When you play the game of vis, you iterate or you die.
CHAPTER III
Where to go from here?
+ emotion
+ connections
+ connections
Gain insights from a single episode emotion & connections
Sample data
Character CountJon Snow+Sansa 1000
Tormund+Brienne 500
Bran Stark+Hodor 300
… …
Character CountHodor 10000
Jon Snow 5000
Daenerys 4000
… …
INDIVIDUALS CONNECTIONS
+ top emojis + top emojis
*These numbers are made up for presentation, not real data.
Graph
NODES EDGES
+ top emojis + top emojis
Character CountJon Snow+Sansa 1000
Tormund+Brienne 500
Bran Stark+Hodor 300
… …
Character CountHodor 10000
Jon Snow 5000
Daenerys 4000
… …
*These numbers are made up for presentation, not real data.
Network Visualization
Node-link diagram
Force-directed layout http://blockbuilder.org/kristw/762b680690e4b2b2666dfec15838a384
+ Collision Detection
http://blockbuilder.org/kristw/2850f65d6329c5fef6d5c9118f1de6e6
+ Collision Detection (with clusters)
https://bl.ocks.org/mbostock/7881887
Let’s get other episodes.
(More) data are coming.CHAPTER IV
More data
1 episode (1 day) => all episodes (6 years)
Rewrite the scripts to get archived data
How much data do we need?
Whole week?
5 days?
2 days?
A day?
etc.
How much data do we need?
Hold the vis.CHAPTER V
The vis is not enough.
Legend
Navigation
Top 3
Adjust threshold
Recap
Filtered Recap
Tooltip
Demohttps://interactive.twitter.com/game-of-thrones
Mobile Support
A visualizer always evaluates his work.CHAPTER VI
“Feedback is the breakfast of champion.”
— Ken Blanchard
Self & Peer
Does it solve the problem?
Tormund + Brienne
Google Analytics
Pageviews
Visitors
Actions
Referrals Sites/Social
Feedback
Feedback
สรุปData are around us and come from many sources. Open data are valuable.
Telling story from data is one possible application. News, Magazine, Company PR.
Takes time and iterations with many trials and errors.
Start with a problem, collect the data, explore, find a story and present it.
Krist Wongsuphasawat / @kristwkristw.yellowpigz.com
The Reading Room 2 Silom Soi 19,
Bangkok, Thailand 10500
ขอบคุณครับ