week 0 data processing - hcid-courses.github.io · hci+d lab. joonhwan lee human-computer...

16
hci+d lab. Joonhwan Lee human-computer interaction + design lab. Week 04 데이터 저널리즘 Data Processing

Upload: others

Post on 21-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Week 0 Data Processing - hcid-courses.github.io · hci+d lab. Joonhwan Lee human-computer interaction + design lab. Week 04 • 데이터 저널리즘 Data Processing

hci+d lab.

Joonhwan Leehuman-computer interaction + design lab.

Week 04 • 데이터 저널리즘

Data Processing

Page 2: Week 0 Data Processing - hcid-courses.github.io · hci+d lab. Joonhwan Lee human-computer interaction + design lab. Week 04 • 데이터 저널리즘 Data Processing

hci+d lab.

• Data Processing Process• CSV import• Fix Data Type• Understand Data through Exploration• Data Filtering• Add Key(Column) to the Data

오늘 다룰 내용

Page 3: Week 0 Data Processing - hcid-courses.github.io · hci+d lab. Joonhwan Lee human-computer interaction + design lab. Week 04 • 데이터 저널리즘 Data Processing

hci+d lab.

Data Processing

Page 4: Week 0 Data Processing - hcid-courses.github.io · hci+d lab. Joonhwan Lee human-computer interaction + design lab. Week 04 • 데이터 저널리즘 Data Processing

hci+d lab.

Data Analysis Process

!4

Question Wrangling Explore Predict Communication

Page 5: Week 0 Data Processing - hcid-courses.github.io · hci+d lab. Joonhwan Lee human-computer interaction + design lab. Week 04 • 데이터 저널리즘 Data Processing

hci+d lab.

Data Analysis Process

✦ Question Phase✦ Characteristics of students who finish MOOC lectures

✦ Age and gender distribution of people who spend money in Gangnam area

!5

Page 6: Week 0 Data Processing - hcid-courses.github.io · hci+d lab. Joonhwan Lee human-computer interaction + design lab. Week 04 • 데이터 저널리즘 Data Processing

hci+d lab.

Data Analysis Process

✦ Wrangling Phase✦ Data acquisition - where to get data to answer the

questions

✦ Data cleaning - (in most case) data need to be cleaned - we spend most of our time for this…(80~90%)

!6

Page 7: Week 0 Data Processing - hcid-courses.github.io · hci+d lab. Joonhwan Lee human-computer interaction + design lab. Week 04 • 데이터 저널리즘 Data Processing

hci+d lab.

Data Analysis Process

✦ Explore Phase✦ Build intuition by exploratory data analysis

✦ information visualization

✦ find patterns

!7

Page 8: Week 0 Data Processing - hcid-courses.github.io · hci+d lab. Joonhwan Lee human-computer interaction + design lab. Week 04 • 데이터 저널리즘 Data Processing

hci+d lab.

Data Analysis Process

✦ Prediction Phase✦ Predict results of out question

✦ eg. Age and gender distribution of people who spend money in Gangnam area => According to our data analysis, 20-30 women spend more money in this area. => marketing insights

✦ Usually requires statistics or machine learning

!8

Page 9: Week 0 Data Processing - hcid-courses.github.io · hci+d lab. Joonhwan Lee human-computer interaction + design lab. Week 04 • 데이터 저널리즘 Data Processing

hci+d lab.

Data Analysis Process

✦ Communication Phase✦ Data Journalisms

✦ Blog Posts

✦ Data Visualizations

✦ Papers

!9

Page 10: Week 0 Data Processing - hcid-courses.github.io · hci+d lab. Joonhwan Lee human-computer interaction + design lab. Week 04 • 데이터 저널리즘 Data Processing

hci+d lab.

Data Analysis Process

!10

Question Wrangling Explore Predict Communication

Page 11: Week 0 Data Processing - hcid-courses.github.io · hci+d lab. Joonhwan Lee human-computer interaction + design lab. Week 04 • 데이터 저널리즘 Data Processing

hci+d lab.

Data Acquisition

✦ Downloading files ✦ Accessing an API✦ Scraping a web page

!11

➝ will do these later

Page 12: Week 0 Data Processing - hcid-courses.github.io · hci+d lab. Joonhwan Lee human-computer interaction + design lab. Week 04 • 데이터 저널리즘 Data Processing

hci+d lab.

Data Format

✦ CSV: Comma Separated Values✦ data column separated by comma

✦ text file format (xls is binary format) ➝ can read from text editors

!12

Page 13: Week 0 Data Processing - hcid-courses.github.io · hci+d lab. Joonhwan Lee human-computer interaction + design lab. Week 04 • 데이터 저널리즘 Data Processing

hci+d lab.

Data Format

✦ CSV: Comma Separated Values

!13

Page 14: Week 0 Data Processing - hcid-courses.github.io · hci+d lab. Joonhwan Lee human-computer interaction + design lab. Week 04 • 데이터 저널리즘 Data Processing

hci+d lab.

What are we going to do today?

✦ CSV import✦ Fix Data Type✦ Understand Data through Exploration✦ Data Filtering✦ Add Key(Column) to the Data

!14

Page 15: Week 0 Data Processing - hcid-courses.github.io · hci+d lab. Joonhwan Lee human-computer interaction + design lab. Week 04 • 데이터 저널리즘 Data Processing

hci+d lab.

Data & Code

✦ Modified from “Introduction to Data Analysis” course at Udacity.

✦ Using their login data.✦ Data description included.

!15

Page 16: Week 0 Data Processing - hcid-courses.github.io · hci+d lab. Joonhwan Lee human-computer interaction + design lab. Week 04 • 데이터 저널리즘 Data Processing

hci+d lab.

Questions?