mongodb for time series data: analyzing time series data using the aggregation framework and hadoop

58
Solution Architect Jay Runkel @jayrunkel Time Series Data: Aggregations in Action

Upload: mongodb

Post on 12-May-2015

1.750 views

Category:

Technology


10 download

TRANSCRIPT

Page 1: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

Solution Architect

Jay Runkel

@jayrunkel

Time Series Data: Aggregations in Action

Page 2: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

Agenda

• Review Traffic Use Case

• Review Schema Design

• Document Retention Model

• Aggregation Queries

• Map Reduce

• Hadoop

Page 3: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

Use Case Review

Page 4: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

We need to prepare for this

Page 5: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

Develop Nationwide traffic monitoring system

Page 6: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop
Page 7: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

Traffic sensors to monitor interstate conditions

• 16,000 sensors

• Measure at one minute intervals

• Speed• Travel time• Weather, pavement, and traffic conditions

• Support desktop, mobile, and car navigation systems

Page 8: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

What we want from our data

Charting and Trending

Page 9: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

What we want from our data

Historical & Predictive Analysis

Page 10: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

What we want from our data

Real Time Traffic Dashboard

Page 11: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

Review Schema Design

Page 12: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

Document Structure

{ _id: ObjectId("5382ccdd58db8b81730344e2"),

linkId: 900006,

date: ISODate("2014-03-12T17:00:00Z"),

data: [

{ speed: NaN, time: NaN },

{ speed: NaN, time: NaN },

{ speed: NaN, time: NaN },

...

],

conditions: {

status: ”Snow / Ice Conditions",

pavement: ”Ice Spots",

weather: ”Light Snow"

}

}

Page 13: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

Sample Document Structure

Compound, uniqueIndex identifies theIndividual document

{ _id: ObjectId("5382ccdd58db8b81730344e2"),

linkId: 900006,

date: ISODate("2014-03-12T17:00:00Z"),

data: [

{ speed: NaN, time: NaN },

{ speed: NaN, time: NaN },

{ speed: NaN, time: NaN },

...

],

conditions: {

status: ”Snow / Ice Conditions",

pavement: ”Icy Spots",

weather: ”Light Snow"

}

}

Page 14: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

Sample Document Structure

Saves an extra index

{ _id: “900006:14031217”,

data: [

{ speed: NaN, time: NaN },

{ speed: NaN, time: NaN },

{ speed: NaN, time: NaN },

...

],

conditions: {

status: ”Snow / Ice Conditions",

pavement: ”Icy Spots",

weather: ”Light Snow"

}

}

Page 15: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

{ _id: “900006:14031217”,

data: [

{ speed: NaN, time: NaN },

{ speed: NaN, time: NaN },

{ speed: NaN, time: NaN },

...

],

conditions: {

status: ”Snow / Ice Conditions",

pavement: ”Icy Spots",

weather: ”Light Snow"

}

}

Sample Document Structure

Range queries:/^900006:1403/

Regex must be left-anchored &case-sensitive

Page 16: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

{ _id: “900006:14031217”,

data: [

{ speed: NaN, time: NaN },

{ speed: NaN, time: NaN },

{ speed: NaN, time: NaN },

...

],

conditions: {

status: ”Snow / Ice Conditions",

pavement: ”Icy Spots",

weather: ”Light Snow"

}

}

Sample Document Structure

Pre-allocated,60 element array of per-minute data

Page 17: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

Advantages

1. In place updates efficient

2. Dashboards simple queries

Page 18: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

Dashboards

Mon Mar 10 2014 04:57:00 GMT-0700 (PDT)Tue Mar 11 2014 05:00:00 GMT-0700 (PDT) Tue Mar 11 2014 21:59:00 GMT-0700 (PDT)0

10

20

30

40

50

60

70

Chart Title

Series1

db.linkData.find({_id : /^20484087:2014031/})

Page 19: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

Supporting Queries From Navigation Systems

Page 20: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

Navigation System Queries

What is the average speed for the last 10 minutes on 50 upcoming road segments?

Page 21: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

Current Real-Time Conditions

Last ten minutes of speeds and times

{ _id : “I-87:10656”,

description : "NYS Thruway Harriman Section Exits 14A - 16",

update : ISODate(“2013-10-10T23:06:37.000Z”),

speeds : [ 52, 49, 45, 51, ... ],

times : [ 237, 224, 246, 233,... ],

pavement: "Wet Spots",

status: "Wet Conditions",

weather: "Light Rain”,

averageSpeed: 50.23,

averageTime: 234,

maxSafeSpeed: 53.1,

location" : {

"type" : "LineString",

"coordinates" : [

[ -74.056, 41.098 ],

[ -74.077, 41.104 ] }

}

Page 22: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

{ _id : “I-87:10656”,

description : "NYS Thruway Harriman Section Exits 14A - 16",

update : ISODate(“2013-10-10T23:06:37.000Z”),

speeds : [ 52, 49, 45, 51, ... ],

times : [ 237, 224, 246, 233,... ],

pavement: "Wet Spots",

status: "Wet Conditions",

weather: "Light Rain”,

averageSpeed: 50.23,

averageTime: 234,

maxSafeSpeed: 53.1,

location" : {

"type" : "LineString",

"coordinates" : [

[ -74.056, 41.098 ],

[ -74.077, 41.104 ] }

}

Current Real-Time Conditions

Pre-aggregated metrics

Page 23: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

{ _id : “I-87:10656”,

description : "NYS Thruway Harriman Section Exits 14A - 16",

update : ISODate(“2013-10-10T23:06:37.000Z”),

speeds : [ 52, 49, 45, 51, ... ],

times : [ 237, 224, 246, 233,... ],

pavement: "Wet Spots",

status: "Wet Conditions",

weather: "Light Rain”,

averageSpeed: 50.23,

averageTime: 234,

maxSafeSpeed: 53.1,

location" : {

"type" : "LineString",

"coordinates" : [

[ -74.056, 41.098 ],

[ -74.077, 41.104 ] }

}

Current Real-Time Conditions

Geo-spatially indexed road segment

Page 24: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

db.linksAvg.update(

{"_id" : linkId},

{ "$set" : {"lUpdate" : date},

"$push" : {

"times" : { "$each" : [ time ], "$slice" : -10 },

"speeds" : {"$each" : [ speed ], "$slice" : -10}

}

})

Maintaining the current conditions

Each update pops the last element off the array and pushes the new value

Page 25: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

Document Retention

Page 26: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

Document retention

Doc per hour

Doc per day

2 weeks

2 months

1year

Doc per Month

Page 27: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

Rollup – 1 day

// daily document// retained for 2 months{ _id: "link:date",

// 24 element array hourly: [ { speed: { sum: , count: }, time: { sum: , count: } }, { speed: { sum: , count: }, time: { sum: , count: } } ]}

Page 28: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

Analysis With The Aggregation Framework

Page 29: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

Pipelining operations

grep | sort |uniq

Piping command line operations

Page 30: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

Pipelining operations

$match $group | $sort|

Piping aggregation operations

Stream of documents Result document

Page 31: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

What is the average speed for a given road segment?

> db.linkData.aggregate( { $match: { ”_id" : /^20484097:/ } }, { $project: { "data.speed": 1, linkId: 1 } } , { $unwind: "$data"}, { $group: { _id: "$linkId", ave: { $avg: "$data.speed"} } } );{ "_id" : 20484097, "ave" : 47.067650676506766 }

Page 32: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

What is the average speed for a given road segment?

Select documents on the target segment

> db.linkData.aggregate( { $match: { ”_id" : /^20484097:/ } }, { $project: { "data.speed": 1, linkId: 1 } } , { $unwind: "$data"}, { $group: { _id: "$linkId", ave: { $avg: "$data.speed"} } } );{ "_id" : 20484097, "ave" : 47.067650676506766 }

Page 33: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

What is the average speed for a given road segment?

Keep only the fields we really need

> db.linkData.aggregate( { $match: { ”_id" : /^20484097:/ } }, { $project: { "data.speed": 1, _id: 1 } } , { $unwind: "$data"}, { $group: { _id: "$_id", ave: { $avg: "$data.speed"} } } );{ "_id" : 20484097, "ave" : 47.067650676506766 }

Page 34: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

What is the average speed for a given road segment?

Loop over the array of data points

> db.linkData.aggregate( { $match: { ”_id" : /^20484097:/ } }, { $project: { "data.speed": 1, _id: 1 } } , { $unwind: "$data"}, { $group: { _id: "$_id", ave: { $avg: "$data.speed"} } } );{ "_id" : 20484097, "ave" : 47.067650676506766 }

Page 35: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

What is the average speed for a given road segment?

Use the handy $avg operator

> db.linkData.aggregate( { $match: { ”_id" : /^20484097:/ } }, { $project: { "data.speed": 1, “_id”: 1 } } , { $unwind: "$data"}, { $group: { _id: "$_id", ave: { $avg: "$data.speed"} } } );{ "_id" : 20484097, "ave" : 47.067650676506766 }

Page 36: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

More Sophisticated Pipelines: average speed with variance

{ "$project" : { mean: "$meanSpd", spdDiffSqrd : { "$map" : { "input": { "$map" : { "input" : "$speeds", "as" : "samp", "in" : { "$subtract" : [ "$$samp", "$meanSpd" ] } } }, as: "df", in: { $multiply: [ "$$df", "$$df" ] }} } } },{ $unwind: "$spdDiffSqrd" },{ $group: { _id: mean: "$mean", variance: { $avg: "$spdDiffSqrd" } } }

Page 37: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

Analysis With MapReduce

Page 38: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

Historic Analysis

How does weather and road conditions affect traffic?

The Ask: what are the average speeds per weather, status and pavement

Page 39: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

MapReducefunction map() { for( var i = 0; i < this.data.length; i++ ) { emit (

this.conditions.weather, { speed :

this.data[i].speed } );

emit (

this.conditions.status, { speed :

this.data[i].speed } );

emit (

this.conditions.pavement, { speed :

this.data[i].speed } );

} }

Page 40: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

MapReducefunction map() { for( var i = 0; i < this.data.length; i++ ) { emit (

this.conditions.weather, { speed :

this.data[i].speed } );

emit (

this.conditions.status, { speed :

this.data[i].speed } );

emit (

this.conditions.pavement, { speed :

this.data[i].speed } );

} }

“Snow”, 34

Page 41: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

MapReducefunction map() { for( var i = 0; i < this.data.length; i++ ) { emit (

this.conditions.weather, { speed :

this.data[i].speed } );

emit (

this.conditions.status, { speed :

this.data[i].speed } );

emit (

this.conditions.pavement, { speed :

this.data[i].speed } );

} }

“Icy spots”, 34

Page 42: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

MapReducefunction map() { for( var i = 0; i < this.data.length; i++ ) { emit (

this.conditions.weather, { speed :

this.data[i].speed } );

emit (

this.conditions.status, { speed :

this.data[i].speed } );

emit (

this.conditions.pavement, { speed :

this.data[i].speed } );

} }

“Delays”, 34

Page 43: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

MapReduce

Page 44: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

MapReduce

Weather: “Rain”, speed: 44

Page 45: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

MapReduce

Weather: “Rain”, speed: 39

Page 46: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

MapReduce

Weather: “Rain”, speed: 46

Page 47: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

MapReduce

function reduce ( key, values ) {

var result = { count : 1, speedSum : 0 }; values.forEach( function( v ){ result.speedSum += v.speed; result.count++; }); return result; }

Page 48: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

MapReduce

function reduce ( key, values ) {

var result = { count : 1, speedSum : 0 }; values.forEach( function( v ){ result.speedSum += v.speed; result.count++; }); return result; }

Page 49: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

Resultsresults: [

{ "_id" : "Generally Clear and Dry Conditions", "value" : { "count" : 902, "speedSum" : 45100 } }, { "_id" : "Icy Spots", "value" : { "count" : 242, "speedSum" : 9438 } }, { "_id" : "Light Snow", "value" : { "count" : 122, "speedSum" : 7686 } }, { "_id" : "No Report", "value" : { "count" : 782, "speedSum" : NaN } }

Page 50: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

Analysis With Hadoop (using the MongoDB Connector)

Page 51: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

Processing Large Data Sets

• Need to break data into smaller pieces

• Process data across multiple nodes

Hadoop

Hadoop

Hadoop Hadoop

HadoopHadoo

pHadoop

Hadoop

Hadoop

Hadoop

Page 52: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

Benefits of the Hadoop Connector

• Increased parallelism• Access to analytics libraries• Separation of concerns• Integrates with existing tool chains

Page 53: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

MongoDB Hadoop Connector

• Multi-source analytics• Interactive & Batch• Data lake

• Online, Real-time• High concurrency &

HA• Live analytics

Operational

Post Processingand

MongoDB Connector for

Hadoop

Page 54: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

Questions?

@[email protected]

Part 3 - July 16th, 2:00 PM EST

Page 55: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

Sign up for our “Path to Proof” Program and get expert advice on implementation, architecture, and

configuration.

www.mongodb.com/lp/contact/path-proof-program

Page 56: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop
Page 57: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

HVDF:https://github.com/10gen-labs/hvdf

Hadoop Connector:https://github.com/mongodb/mongo-hadoop

Page 58: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

Consulting Engineer, MongoDB Inc.

Bryan Reinero

#ConferenceHashtag

Thank You