imcsummit 2015 - day 2 it business track - real-time interactive big data analysis using in-memory...

25
Real-time Interactive Big Data Analysis Using In-Memory Computing Mike Joyce – Manager So0ware Engineer, iCrossing Shawn Nguyen – Lead So0ware Engineer, iCrossing

Upload: 2015-in-memory-computing-summit

Post on 15-Aug-2015

447 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: IMCSummit 2015 - Day 2 IT Business Track - Real-time Interactive Big Data Analysis Using In-Memory Computing

Real-time Interactive Big Data Analysis Using In-Memory Computing Mike  Joyce  –  Manager  So0ware  Engineer,  iCrossing  Shawn  Nguyen  –  Lead  So0ware  Engineer,  iCrossing  

Page 2: IMCSummit 2015 - Day 2 IT Business Track - Real-time Interactive Big Data Analysis Using In-Memory Computing

CONNECTED  MARKETING  PLATFORM  (TECHNOLOGY)  

Bid  Management  /  Trading  Desk  Data  Management  PlaNorm  (Core  Audience)  

+   +  STRATEGY  &  PLANNING  

Market  Research  

AnalyPcs  

Strategy  &  Planning  

PROGRAM  DESIGN  

Media  Planning    &  Buying  

CreaPve  &    Experience  Design  

Content  CreaPon    &  Management  

AUDIENCE  ENGAGEMENT  

Search  MarkePng  Programs  

Social  Media  /  Mobile  

Technology  &    App  Development  

Measurement  &    OpPmizaPon  

Page 3: IMCSummit 2015 - Day 2 IT Business Track - Real-time Interactive Big Data Analysis Using In-Memory Computing

Leveraging audience insights:

•  20+  brands    •  30+  TV  networks  •  50+  newspapers  •  300+  magazines  

CONTENT  DIGITAL  AGENCY  INSIDE  A  

EMPIRE  

Page 4: IMCSummit 2015 - Day 2 IT Business Track - Real-time Interactive Big Data Analysis Using In-Memory Computing

Big Data - Cookies!

300+  million  unique  cookies  •  Subscribers  

•  Visitors  •  InternaPonal  •  MulPple  devices  

Page 5: IMCSummit 2015 - Day 2 IT Business Track - Real-time Interactive Big Data Analysis Using In-Memory Computing

DMP Audience Data

A]ributes  •  Geographic  •  Demographic  •  Behavioral  •  Psychographic  11,000+ Unique Attributes

Page 6: IMCSummit 2015 - Day 2 IT Business Track - Real-time Interactive Big Data Analysis Using In-Memory Computing

Cookies + Audience Attributes = Super Big Data!

90M+ Cookies

Male

Age 20 - 35

Sports Enthusiasts

Average user 800+

attributes

Iowa

High Income

iPad, iPhone

Drives Mini Van

Foodie

72B+ Attribute

User pairs

Page 7: IMCSummit 2015 - Day 2 IT Business Track - Real-time Interactive Big Data Analysis Using In-Memory Computing

Audiences – Targeting vs Discovering

•  Who  you  are  targePng  

•  How  do  you  connect  with  them?  

•  What  describes  them?  

Page 8: IMCSummit 2015 - Day 2 IT Business Track - Real-time Interactive Big Data Analysis Using In-Memory Computing

Data Scientists

Discovering  Audience  A]ributes  1.  Define  an  audience  using  

a]ributes  2.  IdenPfy  all  a]ributes  of  

cookies  in  audience  3.  Calculate  highly  indexing  

a]ributes  

Page 9: IMCSummit 2015 - Day 2 IT Business Track - Real-time Interactive Big Data Analysis Using In-Memory Computing

1) Define the Audience

Population"90M Cookies"

Audience"300K Cookies"

Age: 20-35"

US > North Dakota"

Gender: Male"

Page 10: IMCSummit 2015 - Day 2 IT Business Track - Real-time Interactive Big Data Analysis Using In-Memory Computing

2) Audience Attributes

Interest:  Sports  Enthusiast  

Interest:  Moose  HunPng  

Intent:  Auto  Purchase  >  Truck  

US  >  North  Dakota  >  Fargo  

Pet  Supplies  >  Dog  Food  

Attributes of"Cookies in Audience"

Audience"300K Cookies"

Page 11: IMCSummit 2015 - Day 2 IT Business Track - Real-time Interactive Big Data Analysis Using In-Memory Computing

A3ribute   Audience  Frequency  

PopulaDon  Frequency  

Interest:  Sports  Enthusiast   24%   27%  

Interest:  Moose  HunPng   40%   6%  

Intent:  Auto  Purchase  >  Truck   17%   4%  

US  >  North  Dakota  >  Fargo   30%   2%  

Pet  Supplies  >  Dog  Food   6%   9%  

3) Index the Attributes

Interest:  Sports  Enthusiast  

Interest:  Moose  HunPng  

Intent:  Auto  Purchase  >  Truck  

US  >  North  Dakota  >  Fargo  

Pet  Supplies  >  Dog  Food  

Attributes of"Cookies in Audience"

Page 12: IMCSummit 2015 - Day 2 IT Business Track - Real-time Interactive Big Data Analysis Using In-Memory Computing

Data Scientists

Development  Ask  1.  Make  it  accessible  to  

“normals”  2.  Exportable  visualizaPons  &  

calculaPons  3.  Reduce  query  Pme  from  1  hr  

to  1  sec    

Page 13: IMCSummit 2015 - Day 2 IT Business Track - Real-time Interactive Big Data Analysis Using In-Memory Computing

Why is this Hard?

90M+ Cookies

Male

Age 20 - 35

Sports Enthusiasts

Average user 800+

attributes

Iowa

High Income

iPad, iPhone

Drives Mini Van

Foodie

72B+ Attribute

User pairs

Algorithm  1. Check  every  cookie  if  it  saPsfies  audience  criteria  

2. Collect  all  a]ributes  for  every  audience  cookie  

3. Calculate  percentages  &  index  

Within  1  sec  !!!!!!  

Page 14: IMCSummit 2015 - Day 2 IT Business Track - Real-time Interactive Big Data Analysis Using In-Memory Computing

•  Audience discovery –  Cookie Attributes –  Frequency vs Population

•  Built for non-technical users –  Strategy –  Sales / Account –  Anyone

•  Flexible –  Research tool –  In-meeting, iterative discovery

•  Approachable –  Real-time –  Results in seconds –  Simple, elegant interface –  Multiple export formats

“Making science accessible”

The Answer – Audience Discovery Tool

Page 15: IMCSummit 2015 - Day 2 IT Business Track - Real-time Interactive Big Data Analysis Using In-Memory Computing

Data Processing R& D

Page 16: IMCSummit 2015 - Day 2 IT Business Track - Real-time Interactive Big Data Analysis Using In-Memory Computing

Traditional Relational Databases

•  Long  load  Pme  •  Complex  queries  resulPng  in  long  query  Pmes  

•  Rigid  data  model  

Page 17: IMCSummit 2015 - Day 2 IT Business Track - Real-time Interactive Big Data Analysis Using In-Memory Computing

Non Traditional Databases

•  Lack  of  complex  query  feature  •  Large  memory  footprint  requirement  •  AggregaPon  query  exceeded  by  many  10x  of  seconds  

Page 18: IMCSummit 2015 - Day 2 IT Business Track - Real-time Interactive Big Data Analysis Using In-Memory Computing

The Low Hanging Fruit

•  In  memory  cache  •  Customizable  query  using  Java  code  •  RelaPvely  low  data  loading  Pme  

Page 19: IMCSummit 2015 - Day 2 IT Business Track - Real-time Interactive Big Data Analysis Using In-Memory Computing

The Vertical Problem

Page 20: IMCSummit 2015 - Day 2 IT Business Track - Real-time Interactive Big Data Analysis Using In-Memory Computing

Distributed Computing Ecosystem

•  Not  producPon  ready  •  Data  import  fails  without  explanaPon  •  AggregaPon  fails  to  impress  

Page 21: IMCSummit 2015 - Day 2 IT Business Track - Real-time Interactive Big Data Analysis Using In-Memory Computing

Back to Basics

•  Pure  Java  code  soluPon  •  Data  and  logic  must  exists  in  same  memory  space  

•  Capable  of  advanced  filtering  

Page 22: IMCSummit 2015 - Day 2 IT Business Track - Real-time Interactive Big Data Analysis Using In-Memory Computing

•  Distributed  compuPng,  low  overhead  •  Data  locality  •  Minimal  code  migraPon  

Page 23: IMCSummit 2015 - Day 2 IT Business Track - Real-time Interactive Big Data Analysis Using In-Memory Computing

The Distributed Solution

Page 24: IMCSummit 2015 - Day 2 IT Business Track - Real-time Interactive Big Data Analysis Using In-Memory Computing

The Challenges

•  Tedious  manual  data  distribuPon  •  Gar  building  and  deployment  issues  •  Development  challenges  

Page 25: IMCSummit 2015 - Day 2 IT Business Track - Real-time Interactive Big Data Analysis Using In-Memory Computing

What We Learned

•  Indexed  data  requiring  minor  calculaPons  -­‐-­‐  databases  (relaPonal  &  noSQL)  great  

•  Large  non-­‐indexed  data    -­‐-­‐  the  data  &  processing    need  to  live  in  the  same  (memory)  space