ร้อยเรื่องราวจากข้อมูล / storytelling with data

Post on 12-Apr-2017

119 Views

Category:

Data & Analytics

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Krist Wongsuphasawat / @kristw

ร้อยเรื่องราวจากข้อมูล STORYTELLING WITH DATA

แนะนําตัวก่อน

Computer Engineer Chulalongkorn University

PhD in Computer Science Information Visualization Univ. of Maryland

IBMMicrosoft

Data Scientist Twitter

Krist Wongsuphasawat / @kristw

ข้อมูล

ประมง

400

เก็บข้อมูล

Time Location Type

12:00 Paragon Magikarp

12:05 Siam Dis Magikarp

12:40 CTW Magikarp

… … …

เวลา

00:00 12:00 00:006:00 18:00

จำนวนปลา

เวลา

DATA VISUALIZATION การแปลงข้อมูลเป็นภาพ

ประวัติศาสตร์

data

Number of Napoleon's troops, Distance, Temperature, Latitude and Longitude,

Direction of travel, Location (relative to specific dates)

2 dimensions 6 types of data

DATA VISUALIZATION

Explanatory Communicate known information

Exploratory Explore data to reveal insights

ข้อมูลมาจากไหน?

DATA SOURCESOpen data Publicly available

Private data owned by organization, not available to public

Self-collected data Manual, site scraping, etc.

Combination of the above

OPEN DATA

OPEN DATA

เก็บเองก็ได้

ข้อมูลที่ทวิตเตอร์Tweets

Text, Time, Location, Media

User information Age, Country, etc.

Follows

User interactions Navigation, Views

MANY FORMS OF DATAStandalone files txt, csv, tsv, json, excel, Google Docs, …, pdf*

APIs better quality with more overhead

Databases doesn’t necessary mean they are organized

Big data bigger pain

HAVING ALL TWEETSHow people think I feel.

How people think I feel. How I really feel.

HAVING ALL TWEETS

CHALLENGESGet relevant Tweets hashtag: #oscars keywords: “goal” (football)

Too big Need to aggregate & reduce size

Slow Long processing time (hours)

Hadoop Cluster

GETTING BIG DATA

Data Storage

Pig / Scalding (slow)

GETTING BIG DATAHadoop Cluster

Data Storage

Tool

Hadoop Cluster

Pig / Scalding (slow)

GETTING BIG DATA

Data Storage

Tool

Pig / Scalding (slow)

GETTING BIG DATAHadoop Cluster

Data Storage

Tool

Your laptop Smaller dataset

Hadoop Cluster

Pig / Scalding (slow)

Data Storage

Tool

Final dataset

Tool node.js / python / excel (fast)

Your laptop

GETTING BIG DATA

Smaller dataset

เอาข้อมูลไปทําอะไร?

APPLICATIONS OF DATAPersonal analytics

Anyone

Product analytics Product Manager, Engineer

Data Journalism News, Magazine, Company’s Public Relations

NEW YORK TIMES GRAPHICS

http://www.nytimes.com/interactive/2014/08/13/upshot/where-people-in-each-state-were-born.html?abt=0002&abg=0#New_York

THE GUARDIAN

NEWSNew York Times

The Guardian

Washington Post

Wall Street Journal

FiveThirtyEight

etc.

GOOGLE TRENDS

https://www.google.com/trends/story/US_cu_XRyhKlcBAACrtM_en

GOOGLE TRENDS

https://www.google.com/trends/story/US_cu_XRyhKlcBAACrtM_en

UBER

https://newsroom.uber.com/a-day-in-the-life-of-uber/

ตัวอย่างงาน

ทวีตอะไร?

โปเกมอนที่ถูกพูดถึงมากที่สุด

ทวีตเมื่อไหร่?

ทวีตต่อนาที

ทวีตต่อนาที

interactive.twitter.com/euro2016

ทวีตที่ไหน?

LOCATIONLow density

High density

by Miguel Rios

LOCATIONLow density

High density

by Miguel Rios

LOCATION

flickr.com/photos/twitteroffice/8798020541

San Francisco

Low density

High density

by Miguel Rios

Rebuild the world based on

tweet density

twitter.github.io/interactive/andes/

by Nicolas Garcia Belmonte

ทวีตอะไร? ที่ไหน? เมื่อไหร่?

HAPPY NEW YEAR สวัสดีปีใหม่

ปีใหม่ 2013

twitter.github.io/interactive/newyear2014/

USER อยู่ที่ไหน?

USER + LOCATION : FAN MAP

interactive.twitter.com/nfl_followers2014

USER + LOCATION : FAN MAP

interactive.twitter.com/nba_followers

USER + LOCATION : FAN MAP

interactive.twitter.com/premierleague

interactive.twitter.com

มีขั้นตอนอะไรบ้าง?

ขั้นตอนวิเคราะห์ข้อมูลCollect

Clean

Explore*

Analyze

Present*

ขั้นตอนวิเคราะห์ข้อมูลCollect

Clean

Explore*

Analyze

Present*

CASE STUDY: GAME OF THRONES

Problem is coming.CHAPTER I

“Problem first, not solution backward”

— Brian Caffo (via Ron Brookmeyer)

“If all you have is a hammer, everything looks like a nail.”

— Abraham Maslow

Problem

Want to know what the audience talk about a TV show

Problem

Want to know what the audience talk about a TV show

from Tweets

HBO’s Game of Thrones

Based on a book series “A Song of Ice and Fire” Medieval Fantasy. Knights, magic and dragons.

Brief Story

A King dies. 

A lot of contenders wage a war to reclaim the throne.

Minor characters with no claim to the throne set their own plans in action to gain power

when all the major characters end up killing each other.

Brave/Honest/Honorable characters die.

Intelligent but shady characters and characters who know nothing

continue to live.

While humans are busy killing each other, ice zombies “White walkers” are invading from the North.

The only group who seems to care about this is neutral group called the Night’s Watch.

HBO’s Game of Thrones

Based on a book series “A Song of Ice and Fire” Medieval Fantasy. Knights, magic and dragons.

Many characters. Anybody can die.

6 seasons (57 episodes) so far

Multiple storylines in each episode

Problem

Want to know what the audience talk about a TV show

from Tweets

Ideas

Common words Too much noise

Ideas

Common words Too much noise

Characters How o"en each character were mentioned?

I demand a trial by prototyping.CHAPTER II

Prototyping

Pull sample data from Twitter API

Character recognition and counting naive approach

Sample Tweet

Sample Tweet

List of namesDaenerys Targaryen,Khaleesi

Jon Snow

Sansa Stark

Tyrion Lannister

Arya Stark

Cersei Lannister

Khal Drogo

Gregor Clegane,Mountain

Margaery Tyrell

Joffrey Baratheon

Bran Stark

Theon Greyjoy

Jaime Lannister

Brienne

Eddard Stark,Ned Stark

Ramsay Bolton

Sandor Clegane,Hound

Ygritte

Stannis Baratheon

Petyr Baelish,Little Finger

Robb Stark

Bronn

Varys

Catelyn Stark

Oberyn Martell

Daario Naharis

Davos Seaworth

Jorah Mormont

Melisandre

Myrcella Baratheon

Tywin Lannister

Tommen Baratheon

Grey Worm

Tyene Sand

Rickon Stark

Missandei

Roose Bolton

Robert Baratheon

Jojen Reed

Jeor Mormont

Tormund Giantsbane

Lysa Arryn

Yara Greyjoy,Asha Greyjoy

Samwell Tarly,Sam

Hodor

Victarion Greyjoy

High Sparrow

Dragon

Winter

Dothraki

Sample data

Character CountHodor 10000

Jon Snow 5000

Daenerys 4000

Bran Stark 3000

… …

*These numbers are made up for presentation, not real data.

When you play the game of vis, you iterate or you die.

CHAPTER III

Where to go from here?

+ emotion

+ connections

+ connections

Gain insights from a single episode emotion & connections

Sample data

Character CountJon Snow+Sansa 1000

Tormund+Brienne 500

Bran Stark+Hodor 300

… …

Character CountHodor 10000

Jon Snow 5000

Daenerys 4000

… …

INDIVIDUALS CONNECTIONS

+ top emojis + top emojis

*These numbers are made up for presentation, not real data.

Graph

NODES EDGES

+ top emojis + top emojis

Character CountJon Snow+Sansa 1000

Tormund+Brienne 500

Bran Stark+Hodor 300

… …

Character CountHodor 10000

Jon Snow 5000

Daenerys 4000

… …

*These numbers are made up for presentation, not real data.

Network Visualization

Node-link diagram

Force-directed layout http://blockbuilder.org/kristw/762b680690e4b2b2666dfec15838a384

+ Collision Detection

http://blockbuilder.org/kristw/2850f65d6329c5fef6d5c9118f1de6e6

+ Community Detection

https://github.com/upphiminn/jLouvain

+ Collision Detection (with clusters)

https://bl.ocks.org/mbostock/7881887

Let’s get other episodes.

(More) data are coming.CHAPTER IV

More data

1 episode (1 day) => all episodes (6 years)

Rewrite the scripts to get archived data

How much data do we need?

Whole week?

5 days?

2 days?

A day?

etc.

How much data do we need?

Hold the vis.CHAPTER V

The vis is not enough.

Legend

Navigation

Top 3

Adjust threshold

Recap

Filtered Recap

Tooltip

Demohttps://interactive.twitter.com/game-of-thrones

Mobile Support

A visualizer always evaluates his work.CHAPTER VI

“Feedback is the breakfast of champion.”

— Ken Blanchard

Self & Peer

Does it solve the problem?

Tormund + Brienne

Google Analytics

Pageviews

Visitors

Actions

Referrals Sites/Social

Feedback

Feedback

สรุปData are around us and come from many sources. Open data are valuable.

Telling story from data is one possible application. News, Magazine, Company PR.

Takes time and iterations with many trials and errors.

Start with a problem, collect the data, explore, find a story and present it.

Krist Wongsuphasawat / @kristwkristw.yellowpigz.com

The Reading Room 2 Silom Soi 19,

Bangkok, Thailand 10500

ขอบคุณครับ

top related