citihub open source and cloud approach to social media listening

23
www.citihub.com Template V4.01 April 2015 Open Data Analytics Open, flexible technology solutions for Social Listening Prepared by Ian Tivey, Associate Partner

Upload: chris-allison

Post on 11-Aug-2015

80 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Citihub Open Source and Cloud approach to Social Media Listening

www.citihub.com

Template V4.01

April 2015

Open Data AnalyticsOpen, flexible technology solutions for Social Listening

Prepared by Ian Tivey, Associate Partner

Page 2: Citihub Open Source and Cloud approach to Social Media Listening

Citihub at a glance

P2

Tier 1 IT Consultancy

>200 consultants in 3 regions

Client base

Strong heritage with leading

Financial Services firms

Investment Banking

Hedge Fund & Asset Mgt

Retail Financial Services

Service Providers

Growing footprint in other

sectors e.g.

Government

Education

Legal

eCommerce

New York

Toronto

London

Zurich

Hong Kong

Singapore

Page 3: Citihub Open Source and Cloud approach to Social Media Listening

Social listening

P3

“Listening to the conversations that are going on in social media channels and

using the information gleaned to gain insights in areas like customer sentiment”

According to Forrester1, many marketing leaders are missing the social data

opportunity

• Few marketing leaders convert available social data into social intelligence

• Most marketers don’t have the ability to analyze social data

• Agencies’ use of social data remains inconsistent

• Most listening platforms are ill equipped to inform marketing strategy

1 Forrester report: Use Social Data To Improve Your Social Marketing Maturity, Jan 5 2015

Page 4: Citihub Open Source and Cloud approach to Social Media Listening

3rd party software solutions

P4

The ‘Enterprise Listening’ or “Social Listening’ platform market is 10 years old with

market leaders including Radion 6, Synthesio and Sprinklr

Key differentiators:

• Data quality

• Sentiment engine intelligence

• Integration (e.g. with CRM)

Challenges

• Platforms can cost $10s thousands/month

• Different algorithms (and hence different tools)

can give varying results when applied to

different problems

• Most listening platforms are ill equipped to

inform marketing strategy

Forrester Wave™: Enterprise Listening Platforms 2014

Page 5: Citihub Open Source and Cloud approach to Social Media Listening

Alternative platform from open source and public cloud

P5

A flexible process is needed for approaching data analysis problems, allowing for

discovery, iteration and experimentation

Using open source tools, running in public cloud, compliments this requirement for

flexibility, allowing the problem to define the tools rather than retrofitting the

approach to the tools available

• Best results: Choose the stack and algorithms that best fit the problem and help

produce the best results

• Time to market: Suited to rapid, cost effective prototyping while searching for

business value. No need to build-out internal infrastructure and negotiate license

costs

• Cost: Lowest cost model with open source and public cloud

• Future proofing: No lock-in to expensive 3rd party licenced platforms. Industry

will undoubtedly go through expansion and consolidation

• Scale: Use of public cloud allows infinite scalability for large data sources and

intensive analytics

• Correlation: simpler integration with internal data sources

Page 6: Citihub Open Source and Cloud approach to Social Media Listening

Case study 1Automated Categorisation and Clustering using Machine Learning Techniques

P6

Page 7: Citihub Open Source and Cloud approach to Social Media Listening

Case study 1

P7

Example business goal

• Sentiment analysis from social media, blogs public websites etc.

• What is customer perception of our products and services or those of our

competitors?

• How is that trending and can we correlate to sales figures and pipeline?

Technical challenges

• Understanding nuances of good vs bad sentiment especially when dealing with

slang and abbreviated text (in this example from Facebook)

• Language support: how to deal with multiple languages e.g. Chinese

Solution

• Open source stack including NLP (Natural Language Processing) from Stanford

University and Machine Learning software from Apache Foundation running on

AWS platform

Page 8: Citihub Open Source and Cloud approach to Social Media Listening

Case study 1 – sample output

P8

Sample visualisation of negative sentiment from Ocean Park public Facebook page

Page 9: Citihub Open Source and Cloud approach to Social Media Listening

EMR Cluster

Case Study 1 - Architecture Diagram

P9

EC2

Graph API

Commentspost_id

user_id

created_time

content

number_of_likes

Likespost_id

user_id

Postspage_id

post_id

post_message

Usersuser_id

name

gender

birthday

email

Master Node

Slave Node

HDFS

Slave Node

HDFS

Slave Node

HDFS

Scripts

cluster

visualise

Java

App

Chinese Word Segmentation

Citihub Facebook

Data Structure

NLP

App

Unsupervised

Clustering App

Machine

Learning

Page 10: Citihub Open Source and Cloud approach to Social Media Listening

Case Study 1 - Workflow Diagram

P10

Input Facebook page ID into webpage

nodejs scripts pull data via Facebook API

Results stored into S3 bucket. Analyse with unsupervised

clustering tool for early insight.

Create training data set. Manually categorise first 200

comments into +ve, -ve, questions and “noise”

Train machine learning tool for supervised classification

of comments based on training data set using Naïve

Bayes algorithm

classification

model Classify full data set in machine learning tool using

classification model

Pre-process text using NLP tools – here, we segment

Chinese comments into “words”

Analyse results with unsupervised

clustering tool.

Unsupervised

Clustering App

NLP

App

Machine

Learning

Machine

Learning

Page 11: Citihub Open Source and Cloud approach to Social Media Listening

Case Study 1 – Lessons

P11

• Open Source tools and technology exist which,

used in conjunction with public

cloud ecosystem, provide powerful yet

cost-effective capabilities for data analytics

• The techniques used to create classification models

are language independent and not limited to classifying

sentiment nor social media alone

• For language-based analysis, NLP pre-processing creates

strong models

• Social media channels, like Facebook & Twitter

contain a significant amount of noise. Cutting through

the noise to find useful insights requires

• a good understanding of the business goals

• an understanding of how to break down the problem into various tasks and

workflows

• an iterative process allowing experimentation and improvement

• tool selection appropriate to the task

Page 12: Citihub Open Source and Cloud approach to Social Media Listening

Case Study 2Twitter workflow

P12

Page 13: Citihub Open Source and Cloud approach to Social Media Listening

Case study 2

P13

Example business goal

• Geographical sentiment trending on social media (in this example Twitter)

Technical challenges

• Geographical data needs to traced via multiple techniques

• Handling sheer volume of real-time data

Solution

• AWS Kinesis can be used to buffer Twitter firehose (real-time feed)

• Hadoop Cluster to extract, transform and load data

- enrich non-geotagged data with geo-coordinates

- standardise table format and load into data warehouse (RedShift).

• Visualisation tools (e.g. CartoDB) load data from data warehouse

Page 14: Citihub Open Source and Cloud approach to Social Media Listening

Case study 2 – example output

P14

Time-series geo-visualisation of tweets

• Recorded demo: https://www.youtube.com/watch?v=pqOxq5G9lkE

Page 15: Citihub Open Source and Cloud approach to Social Media Listening

Case Study 2 – Twitter Visualisation Workflow

P15

Kinesis

EMR

Redshift

Capture & visualisation of historical tweets

ETL using EMR

experimenting with hive and Impala running on hadoop

and Spark

Capture data in Redshift data warehouse

Capture twitter stream using Java API

Queue twitter stream in Kinesis

Time-series geo-visualisation of tweets Sentiment scoring based on word

valence

Other

visualisations

Page 16: Citihub Open Source and Cloud approach to Social Media Listening

Case Study 3Customer insights based on Facebook page comments & likes

P16

Page 17: Citihub Open Source and Cloud approach to Social Media Listening

Case study 3

P17

Example business goal

• Customer profiling using social media (Facebook in this example)

• How do I learn more about my customers likes and habits?

• Can I correlate to internal CRM systems?

Technical challenges

• Normalising Facebook Graph API model into a relational model for analysis

• Facebook “opt-in” permissioning system for gaining access to customer data

Solution

• Facebook data is normalized through Node.JS script into Cloud Storage (AWS

S3)

• Use of Hive in Hadoop cluster to analyse normalized data with a SQL-like

interface

Page 18: Citihub Open Source and Cloud approach to Social Media Listening

Case study 3 – example Facebook page

P18

Illustration using an

example Facebook

page against which to

get deeper information

about ‘friends’ ie

customers

Offers page gives

customers the option to

opt into giving access to

deeper personal

information

Page 19: Citihub Open Source and Cloud approach to Social Media Listening

Case Study 3 - Architecture

P19

EMR Cluster

EC2

Graph API

Commentspost_id

user_id

created_time

content

number_of_likes

Likespost_id

user_id

Postspage_id

post_id

post_message

Usersuser_id

name

gender

birthday

email

Master Node

Slave Node

HDFS

Slave Node

HDFS

Slave Node

HDFS

Scripts

App

Citihub Facebook

Data Structure

App

Users Likeuser_id

category

page_id

page_name

Liked Pagespage_id

name

category

sub-categories

about

description

general_info

products

num_likes

city

country

Page 20: Citihub Open Source and Cloud approach to Social Media Listening

Case Study 3 – Analysis Workflow

P20

Login to Facebook scraping app and click on “scrap

pages I administer”

nodejs scripts pull data via Facebook API

Results stored into S3 bucket

Issue SQL-like commands to load data from S3 into

EMR/Hadoop HDFS

HDFS

Issue SQL-like commands to analyse the data using

Hadoop running on EMR

• show me the list of Facebook page categories that

commenters on my Facebook like & number of pages

in each category

• search for keywords in the feeds of commenters on

my Facebook page and return the names and

demographics of those users

Page 21: Citihub Open Source and Cloud approach to Social Media Listening

Case study 3 - Example output

P21

Other ideas

• Analyse active users interaction with us

and other friends, link to internal data about

that person

• Search for specific keywords in friends’

personal timelines

• Use toolsets to perform unsupervised page

categorisation based on the page

descriptions, rather than Facebook

categories

• Create our own social graph to reveal

clusters of friends, analyse clusters

Example social graph

Page 22: Citihub Open Source and Cloud approach to Social Media Listening

Working with Citihub Consulting

P22

Proof of concepts

• Citihub can work with your business, marketing group or technology

departments to establish rapid proof of concepts where you are looking to prove

the value of analytics in the social media space

Mobilise and integrate

• We can work with your technology teams to establish an internal capability that

can adapt to your business needs and social / technology trends

• We can help you with bespoke integration of internal and external data sources

for correlation of data e.g. social sentiment analysis correlated with regional

sales; client profiling on Facebook correlated with internal CRM systems

Service-based analytics

• We can provide managed services where Citihub runs analytics and data

trending for you in the public cloud

Page 23: Citihub Open Source and Cloud approach to Social Media Listening

www.citihub.com

Keith MaitlandManaging Partner

[email protected]

757 3rd Avenue, 20th Floor

New York

NY 10017

+1 212 878 8840

Chris AllisonManaging Director

[email protected]

12F ICC

1 Austin Road, Kowloon

Hong Kong

+852 8108 2777