geosocial big data analysis using python and foss4g with the case study of korean data ilyoung hong...

28
Geosocial big data analysis using python and FOSS4G with the case study of Korean data Ilyoung Hong Namseoul Univ Dep of GIS engineering

Upload: sharleen-quinn

Post on 14-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Geosocial big data analysis using python and FOSS4G with the case study of Korean data Ilyoung Hong Namseoul Univ Dep of GIS engineering

Geosocial big data analysis using python and FOSS4G with the case study of Korean

dataIlyoung Hong

Namseoul Univ

Dep of GIS engineering

Page 2: Geosocial big data analysis using python and FOSS4G with the case study of Korean data Ilyoung Hong Namseoul Univ Dep of GIS engineering

Geosocial data

• Social Media- Tweeter, Facebook is the killer app for Smartphone

• Smart Phone with GPS generates lots of geotagged so-cial data

• Social data with geotagged is called geosocial data• Such as GeoTweet - geotagged tweet, 4sq Venues

Page 3: Geosocial big data analysis using python and FOSS4G with the case study of Korean data Ilyoung Hong Namseoul Univ Dep of GIS engineering

Geosocial Data Researches

• Fujita, Hideyuki. "Geo-tagged Twitter collection and visualization system." Cartogra-phy and Geographic Information Science 40.3 (2013): 183-191. • =>Computational method, data collection

• Jung, Jin‐Kyu. "Code clouds: Qualitative geovisualization of geotweets." The Canadian Geographer/Le Géographe canadien 59.1 (2015): 52-68.

=> qualitative approach, with content analysis

• Li, Linna, Michael F. Goodchild, and Bo Xu. "Spatial, temporal, and socioeconomic pat-terns in the use of Twitter and Flickr." Cartography and Geographic Information Sci-ence 40.2 (2013): 61-77.

Þ Spatial statistical analysis with geodemographic data,

• Mitchell, Lewis, et al. "The geography of happiness: Connecting twitter sentiment and expression, demographics, and objective characteristics of place." (2013): e64417.

=>Sentimental analysis, computational linguistics approach,

Page 4: Geosocial big data analysis using python and FOSS4G with the case study of Korean data Ilyoung Hong Namseoul Univ Dep of GIS engineering

DataCollection

DataAnalyzing

DataVisualization

QuantitativeAnalysis

Data Man-agement

Qualitative Analysis

Web programming

Multi Disciplinary Aspects of geosocial data analysis

Database Manage-ment

Geography, Cartography, GIS

StatisticsLinguisticsText Mining

GeoSocial Data

SociologyJournalismMedia

Page 5: Geosocial big data analysis using python and FOSS4G with the case study of Korean data Ilyoung Hong Namseoul Univ Dep of GIS engineering

challenges of geosocial research

• different data source, format• Tweet, foursquare, Facebook,

• different analysis environment, difference software• Java, php, Python, C, R, ArcGIS, web-programming, database programming, statis-

tics, geovisualizatrion,

• different domain knowledge, multidisciplinary research methods• Computation, geography, sociology, psychology, statistics, linguistics, media, jour-

nalism

Need interdisciplinary cooperation, Are there any way to Integrate these methods?

Page 6: Geosocial big data analysis using python and FOSS4G with the case study of Korean data Ilyoung Hong Namseoul Univ Dep of GIS engineering

Why python/foss4g for Geosocial Big Data?

• Integrated analysis environment in software, library• Python is free and open.• Object-oriented programming (OOP) in Python • WinPython, Anaconda(SCIPY,Ipython), Enthought Canopy for Python 2.7

• large amount of libraries, support different domain knowledge• PyPI - the Python Package Index,  currently 66086 packages

• Simple Coding environment• Quick to Learn and to code• Readability The syntax of Python is readable and clear.

Page 7: Geosocial big data analysis using python and FOSS4G with the case study of Korean data Ilyoung Hong Namseoul Univ Dep of GIS engineering

Research Purpose

• Introduce the intergrated platform to analysize the GeoSocial using python & FOSS4G

• Data collection, management

• Data Analysis, Qualitative & Quantitave methods

• Sentimenal Analysis

• Geovisualizaing

• Present the Case Study with Korean Geosocail Data

• GeoTweet distribution

• Spatial Patterns of Fousquare Venues

• Sentimenal Anlysis of Korean GeoTweet

Page 8: Geosocial big data analysis using python and FOSS4G with the case study of Korean data Ilyoung Hong Namseoul Univ Dep of GIS engineering

Architecteture, at beginning

SocailMedia

JSON

Excelcsv

ShapeArcGIS

Twitter/Foursquare

API

Page 9: Geosocial big data analysis using python and FOSS4G with the case study of Korean data Ilyoung Hong Namseoul Univ Dep of GIS engineering

Data Collection

• Python Streaming API, tweepy• limited rates for one user• However, there is a restriction on data collection from Twitter:

the method• call of Twitter API is limited by 350 calls per hour for one au-

thorized developer account • switch to the other user id when reach to the limits

• unnecessary data.. filtering• geotweet data is just 1% of total tweet

Page 10: Geosocial big data analysis using python and FOSS4G with the case study of Korean data Ilyoung Hong Namseoul Univ Dep of GIS engineering

Columns from Tweet

● Tweet text; => qualitative approach, text mining, keword filter, sentimental analysis,

● Tweet ID; User ID; Destination user ID (only for tweets with “@user ID”);

User profile (including location name input by user);

=> behavioral features, heavy user feature, social network,

● Location coordinates (only for tweets tagged with the location coordinates).• Geovisualization, Spatial Analysis using GIS

● Date and Time => temporal analysis

Page 11: Geosocial big data analysis using python and FOSS4G with the case study of Korean data Ilyoung Hong Namseoul Univ Dep of GIS engineering

until now, made two researches

• Spatial Analysis of Location-Based Social Networks in Seoul, Korea, Journal of Geographic Information System, 2015, 7, 259-265

• Spatial Distribution of Korean Geotweets* Journal of the Korean Cartographic Association, 2015, 15(2), 93-101

Page 12: Geosocial big data analysis using python and FOSS4G with the case study of Korean data Ilyoung Hong Namseoul Univ Dep of GIS engineering

Spatial Analysis of Location-Based Social Networks in Seoul,

• The purpose of this study is to analyze the spatial patterns of location-based social network (LBSN)

data in Seoul using the spatial analysis techniques of geographic information system (GIS). The

study explores the applications of LBSN data by analyzing the association between Seoul’s

Foursquare venues data created based on user participation and the city’s characteristics. The data

regarding Foursquare venues were compiled with a program we created based on Foursquare’s

Python API. The compiled information was converted into GIS data, which in turn was depicted as a

heat map. Cluster analysis was then performed based on hotspots and the correlation with census

variables was analyzed for each administrative unit using geographically weighted regression

(GWR). Based on analytical results, we were able to identify venue clusters around city centers, as

well as differences in hotspots for various venue categories and correlations with census variables.

Page 13: Geosocial big data analysis using python and FOSS4G with the case study of Korean data Ilyoung Hong Namseoul Univ Dep of GIS engineering

about 230,000 venuedata were collected for analysis between March 15 and 21, 2015

Page 14: Geosocial big data analysis using python and FOSS4G with the case study of Korean data Ilyoung Hong Namseoul Univ Dep of GIS engineering
Page 15: Geosocial big data analysis using python and FOSS4G with the case study of Korean data Ilyoung Hong Namseoul Univ Dep of GIS engineering

Spatial Distribution of Korean Geotweets*In this study, we analyzed the distribution of Korean geotweet. Geotweet was ana-lyzed, which was collected at November 2014 through Twitter Streaming API. Us-ing the Python programming, it was carried out to analyze the collected data and GIS data conversion. Twitter use and distribution are concentrated at Seoul and the metropolitan areas and a few heavy users were creating a large number of tweets. Time series analysis showed the characteristics of the tweets that make up the highest point on the Weekend and forms the highest point at 14:00 during the day. In addition, differences in the content that appears every high percentage of retweets and regions through text analysis were also identified. Key Words : Tweeter API, Geotweet, Spatial distribution

Page 16: Geosocial big data analysis using python and FOSS4G with the case study of Korean data Ilyoung Hong Namseoul Univ Dep of GIS engineering

Spatial Distribution of geotweet

Distribution of geotweet, Nov 2014

Daily Distribution of geotweet, Nov 2014

• Nov, 2014, over 2 million tweet was collected.

Page 17: Geosocial big data analysis using python and FOSS4G with the case study of Korean data Ilyoung Hong Namseoul Univ Dep of GIS engineering

Text anal-ysis• high percentage of

retweet• some keyword that

represent regional features• PyTag, Word_cloud

Page 18: Geosocial big data analysis using python and FOSS4G with the case study of Korean data Ilyoung Hong Namseoul Univ Dep of GIS engineering

Problems

• Using Exoplanary Statistic Analysis, Repeated Works but the process is not automated

• Takes times, Data Error

• As time goes by, the data comes to be too big to handle.• Need to be managed at database, not as a text file

• Data and Software show be compatible at the same environ-ment for the automated analysis

Page 19: Geosocial big data analysis using python and FOSS4G with the case study of Korean data Ilyoung Hong Namseoul Univ Dep of GIS engineering

Python & FOSS4G

• integrated analysis environment

• large amount of libraries, support different domain knowledge

• create the automated scripts for analysis

Page 20: Geosocial big data analysis using python and FOSS4G with the case study of Korean data Ilyoung Hong Namseoul Univ Dep of GIS engineering

Social Media Server

GIS Data Server

Analysis Client

Twitter API - Tweepy

Spatialite

GeovisualizationQuantum GIS

WodCloudpytagcloud

Statistical AnalysisPySAL

VisualizeClient

Data Collection

Data Parsing

Sentiment Analysis Python NLTK

Data Conversion

Shape/Text

PANDASfor Data Analysis

pyspatialitepyspatialite

Page 21: Geosocial big data analysis using python and FOSS4G with the case study of Korean data Ilyoung Hong Namseoul Univ Dep of GIS engineering

Analysis Process

Quantita-tives

Quanta-tives

Setiment Analysis

StatisitcalAnalysis

WordClouds

HeatMapThematic MappingHotspot

GWR

SocialMediaData

TextMining

GISData-baseGeo-

Taged? SpatialAnalysis

VisualiingMethod?Data

Type? Analysismethod?

Page 22: Geosocial big data analysis using python and FOSS4G with the case study of Korean data Ilyoung Hong Namseoul Univ Dep of GIS engineering

Spatialite Database, Why

-Standalone & File Based Database: easy to handle

- Compatable, interoperability:

Python, QGIS, ArcGIS, export/import to any format

- Easy to useability, GUI

pyspatialite

Page 23: Geosocial big data analysis using python and FOSS4G with the case study of Korean data Ilyoung Hong Namseoul Univ Dep of GIS engineering

Sentiment Analysis with Python NLTK Text Classification

• sentiment analysis using a NLTK

• Tweet Text => POS, NEU, NEG values

Page 24: Geosocial big data analysis using python and FOSS4G with the case study of Korean data Ilyoung Hong Namseoul Univ Dep of GIS engineering

Heatmap using Quantum GIS2015, July, geotweet

Page 25: Geosocial big data analysis using python and FOSS4G with the case study of Korean data Ilyoung Hong Namseoul Univ Dep of GIS engineering

Hot, Best Postive Place

Jongro

HongDae youngsan

Page 26: Geosocial big data analysis using python and FOSS4G with the case study of Korean data Ilyoung Hong Namseoul Univ Dep of GIS engineering

Word Cloud

JongroHongDae

youngsan

Page 27: Geosocial big data analysis using python and FOSS4G with the case study of Korean data Ilyoung Hong Namseoul Univ Dep of GIS engineering

Best Positive TweetHappy Pride from Kat! #seoul #gaypride #kqcf2015 #korea #hugagaytoday @ Seoul City Hall Korea https://t.co/81TiNdqCMH#seoulgayprideparade HAPPY PRIDE DAY KOREA!!!! #rainbow #lgbt #love #happy #seoul #ko-rea @ Seoul Plaza https://t.co/FUCkHxmIscGood times and more Korean BBQ with the Samsung team #MobLabs #GangnamStyle @ Gang-nam, Seoul, Korea https://t.co/NyIa440NZ3

Happy Sunday :) @ Myeongdong Cathedral https://t.co/TezVZTVtDHWe go by the zoo via the "Elephant Train" to the museum @ Seoul Grand Park Zoo https://t.co/imXCgPrcBGKorean food is the best food #korea #food #nofiilter @ Seoul ,Korea https://t.co/MqVDHqqoEyHave a beautiful and fruitful week IG fam! #MondayLook #mamichoux @ Hongdae Seoul https://t.co/lVM5NdLJypHappy the 4th of July to all my American friends! (@ Thursday Party in Seoul) https://t.co/CG27beaCQlAnd with Elizaveta from Russia :) @ Trickeye Museum https://t.co/7NCrGUYOF1Quick tour of a Korean apartment @ Hongdae Seoul South Korea https://t.co/yTy8mAVCZk..

Page 28: Geosocial big data analysis using python and FOSS4G with the case study of Korean data Ilyoung Hong Namseoul Univ Dep of GIS engineering

Conclusion and Future Work

• Aanalysis of Geosocial Data is the complex, multidiciplanary process

• In this research, present the integrated architecture using Python & FOSS4G

• Future work• automated processing with Python scripts• Need more work on QGIS and PySAL for more advanced analysis and

visualization