(mbl305) you have data from the devices, now what?: getting the value of the iot

83
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Michael Hanisch, Solutions Architect October 9, 2015 You Have the Data from Your Devices Now What? Getting Value from the IoT MBL305

Upload: amazon-web-services

Post on 16-Apr-2017

10.108 views

Category:

Technology


0 download

TRANSCRIPT

© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Michael Hanisch, Solutions Architect

October 9, 2015

You Have the Data from Your

Devices – Now What?Getting Value from the IoT

MBL305

What to Expect from the Session

• Understand different kinds of data relevant to the IoT

• Learn how the AWS platform can help turn data into

insights & actions

• Ideas & advice on how to integrate various AWS

services with the Internet of Things

The Data Driven IoT

50B

2020

1B

2010

1T

2035

Source: McKinsey & Company 2013

Rapid Growth from 1B to 50B Connectable “Things”

Source: McKinsey & Company 2013

Rapid Growth from 1B to 50B Connectable “Things”

All these “Things” generate data:

• Status information

• Sensor readings

• User interactions

• State changes

• Operational events

• …

One of the big challenges with the IoT is to

Collect Analyze Act on

data from devices to generate insights.

Three Ways to Analyze Data

Retrospective

analysis and

reporting

Past Data

Three Ways to Analyze Data

Retrospective

analysis and

reporting

Here-and-now

real-time processing

and dashboards

Present DataPast Data

Three Ways to Analyze Data

Retrospective

analysis and

reporting

Here-and-now

real-time processing

and dashboards

Predictions

to enable smart

applications

Past Data Present Data “Future Data”

Three Ways to Analyze Data

Retrospective

analysis and

reporting

Here-and-now

real-time processing

and dashboards

Predictions

to enable smart

applications

Amazon Kinesis

AWS Lambda

Amazon DynamoDB

Amazon EC2

Amazon Redshift

Amazon RDS

Amazon S3

Amazon EMR

Amazon Machine

Learning

What is special about IoT?

IoT Requires Quick Processing

- Discover patterns in live sensor data

- Correlate events as they happen

- Enrich live data with additional info

Why?

- Trigger quick reactions

- Adapt to usage of Things

- Users want quick reaction & feedback

Here-and-now

real-time processing

and dashboards

IoT Requires Past Context

- Provide context for current events

- Keep information of past events to

determine long-term trends

Why?

- Enables learning from past data

- Enable reporting & explorative analysis

to understand usage

- Usage monitoring and billing (Long-

term storage of usage & billing

metrics)

Retrospective

analysis and

reporting

Predictions

to enable smart

applications

IoT Benefits from “Smart” Devices

- Detect patterns in event data

- Learn 'rules' / distributions in the data

Why?

- Predict future events

- Problems that are likely to appear

- Anticipate user actions (or desired

outcomes)

- Actionable predictions: what to do next

Today’s (simple) example

Indoor Temperature / Climate Sensors

• Fleet of indoor air conditioning units with 3 sensors each

• Deliver updates on temperature, humidity & pressure

every couple of seconds

• Connected to the cloud

• “Semi-reliable”

Sample Message As Sent By Device

{

"temperature" : "100",

"humidity" : "92",

"pressure" : "8”

}

MQTT Topic

MQTT Topic

MQTT Topic

• Each device uses

certificate authentication

• Send messages via MQTT

• One topic per device:

rooms/ac/${deviceID}DeviceID 1

DeviceID 2

DeviceID 3

AWS IoT

Service

(Pub/Sub

Broker)

Questions we might ask our Example Data:

- How many sensors are

connected right now?

- Is the current

temperature in line

with yesterday's / last

year's data?

- How did temperatures

change over time?

- What is the

relationship between

pressure and temp?

- Are our sensor

readings plausible?

- How can we tell a

broken sensor from a

good one?

- Do I have to wear a

sweater to work?

Background: AWS IoT

Highly scalable

Pub Sub Broker

MQTT

Subscribers

Publishers

Secure by DefaultConnect securely via X509 Certs and

TLS v1.2 Client Mutual Auth

Multi-protocol Message GatewayMillions of devices and apps can connect

over MQTT or HTTP.

topicElastic Pub Sub BrokerGo from 1 to 1-billion long-lived

connections with zero provisioning

AWS IoT: Securely Connect Devices

AWS IoT: Front Door to AWS

Device RegistryCloud alter-ego of a physical device. Persists

metadata about the device.

Rules and ActionsMatch patterns and take actions to send data

to other AWS services or republish

Device ShadowsApps and devices can access “RESTful”

Shadow (state) that is in sync with

the device

Device

Thing Name

Sensor Temp

Actuator Servo

GetTemp()

Output LED

Rules Engine

Shadow

Registry

S3

Lambda,

Kinesis

Kinesis Firehose

DynamoDB

SNS

Mobile App

AWS IoT Rules Engine

Rules Engine evaluates inbound

messages published into AWS

IoT, transforms and delivers to the

appropriate endpoint based on

business rules.

External endpoints can be

reached via AWS Lambda and

Amazon Simple Notification

Service (SNS).

Invoke a Lambda function

Put object in an S3 bucket

Insert, Update, Read from

a DynamoDB table

Publish to an SNS Topic

or Endpoint

Publish to a Kinesis stream /

Actions

Amazon Kinesis Firehose

Republish to AWS IoT

Flexibility of Rules – An Example

SQL-like syntax

Where operators

Inline functions

Actions

"SELECT *,

clientId() as MQTTClientId

FROM 'room/ac/+'

WHERE temperature > 85",

"actions": [

{

”sns": {

"roleArn":

"arn:aws:iam::123456789012:role/SNSPutRole",

"topicArn": "arn:aws:sns:us-east-1:123456789012:TempWarningNotification"

}]

Processing Data for

Retrospective Analysis

Example: Receiving & Storing Data

- Devices set up as Things in Device Registry

- Each device sends data as JSON via MQTT

- One MQTT topic per device: rooms/ac/{deviceID}

- Each device has a certificate and access rights to use its

topic (already set up)

Our Goal:

• Move all (?) incoming data into permanent storage

• Make data available for later analysis:

- Reporting

- Billing / metering

- Explorative analysis

- Machine Learning

Our Approach:

1. Set up a Rule to Capture & Transform Incoming Data

2. Define an Action to Store the Data

3. Query & Analyze the Stored Data

1) Set up a Rule to capture all sensor readings

{ "ruleName" : "Capture sensor readings",

"topicRulePayload" : {

"sql" : "SELECT *, clientId() as MQTTClientId

FROM 'rooms/ac/+' ",

"description": "capture data from all sensors",

"actions" : [What goes here?],

"ruleDisabled" : false

}

}

2) Define an Action to Store the Data

But where should we store it?

Storage Options

Amazon S3 Amazon Redshift Amazon RDS Amazon DynamoDB

Storage Options: Amazon S3

Amazon S3

• Actions can directly write into (JSON) files on S3

• Very simple to configure, just provide bucket name

• Results in 1 file per event

• Lots of small files can be hard to handle

• Inefficient when processing with Hadoop / Amazon

EMR or when importing into Redshift

• Useful when you have a very low frequency of events,

e.g. when you only want to log outliers to S3

Storage Options: Amazon S3 (cont'd)

Amazon S3

• Buffer data using Amazon Kinesis or Amazon Kinesis

Firehose to get fewer, larger files

• Buffering, compression & output to S3 is built into

Firehose – no other infrastructure needed!

• Kinesis Connector Library can be extended to perform

transformation, filter or serialize data

• Additional Control over Buffering & Output Formats

• Added complexity: Requires Amazon EC2 workers

running Kinesis Connector Library

Amazon Kinesis

Firehose

Storage Options: Amazon Redshift

• Actions can forward data Amazon Kinesis Firehose

• Buffering & output to Redshift is built into Firehose

• Very easy to setup

• Fully managed

• Use Amazon Kinesis as an alternative

• More control: Use Kinesis Connector Library to

perform transformation, filter or serialize data

• Added complexity: Requires Kinesis Connector

Library etc. to execute on Amazon EC2

Amazon Kinesis

Firehose

Amazon Redshift

Storage Options: Amazon DynamoDB

• Actions can directly write into Amazon DynamoDB

• Creates one row per event, can define:

• Hash Key, Range Key and attributes to store

• E.g. Hash Key = deviceID, range key=timestamp…

• Very simple to configure, just provide table & field names

• Adding GSIs and LSIs provides additional flexibility and

enables different queries

• SELECTs can read from DynamoDB for fast lookups

Amazon

DynamoDB

Storage Options: Amazon DynamoDB{ "sql": "SELECT * FROM 'rooms/ac/+'",

"ruleDisabled": false,

"actions": [{

"dynamoDB": {

"tableName": "my-dynamodb-table",

"roleArn": "arn:aws:iam::X:role/mbl305-

demo-role",

"hashKeyField": "roomID",

"hashKeyValue": "${topic(3)}",

"rangeKeyField": "timestamp",

"rangeKeyValue": "${timestamp()}"

"payloadField" :

} }]}

Amazon

DynamoDB

Storage Options: Amazon DynamoDB (cont'd)

• AWS Lambda function provides additional flexibility:

• Transform data

• Write into different/multiple tables

• Enrich data with contextual information pulled in

from other sources

• Only able to process one event at a time! (i.e., AWS

Lambda –when called from AWS IoT– cannot aggregate

events before writing to DynamoDB)

Amazon

DynamoDB

AWS

Lambda

3) Query & Analyze the Stored Data

How can we query the data?

Amazon DynamoDB

Amazon S3

Amazon Redshift

Amazon

EMR

JDBC / ODBC

3) Query & Analyze the Stored Data (cont'd)

CO

PY

Hive/

SparkSQL/

Presto

Amazon DynamoDB

Amazon S3

Amazon Redshift

COPY JDBC / ODBC

3) Query & Analyze the Stored Data (cont'd)

Amazon

QuickSight

Recommendations

Want to run a lot of queries constantly?

Use Kinesis Firehose to write into Amazon Redshift

Need fast lookups, e.g., in Rules or Lambda functions?

write into DynamoDB, add indices if necessary

Have a need for heavy queries but not always-on?

Use Kinesis Firehose & S3, process with Amazon EMR.

Back to our Example!

1) Set up a Rule to capture all sensor readings

{ "sql" : "SELECT *, topic(3) as deviceID, timestamp() as reading_time,

clientId() as MQTTClientId

FROM 'rooms/ac/+' ",

"description": "Forward sensor data to Firehose",

"actions" : [{

"firehose" : {

"deliveryStreamName": "sensors-firehose",

"roleArn": "string"

}

}],

"ruleDisabled" : false }

2) Pump Data through Firehose into Redshift

sensors/devices

In a farm sending (Temp, Pressure, Humidity)

PolicyPrivate Key

& Certificate

Thing/Device

Rule

IAM Role

PolicySDK

AWS IoT AWS Services

Actions

Publish

Store data from all

the field sensors in database

Amazon

Kinesis

Firehose

Amazon

Redshift

Rule: SELECT * FROM ‘rooms/ac/+’

3) Analyze Data using Amazon QuickSight

PolicyPrivate Key

& Certificate

Thing/Device

Rule

IAM Role

PolicySDK

AWS IoT AWS Services

Amazon

Kinesis

Firehose

Amazon

Redshift

Amazon

QuickSight

DEMO TIME!

Real-time Metrics & Reactions

Our Goal:

• Alert on big temperature changes

• Collect & Visualize metrics current sensor readings

1) Set up Rule to react to relevant sensor data

{ "ruleName" : "Notify on high temperatures",

"topicRulePayload" : {

"sql" : "SELECT *, clientId() as MQTTClientId

FROM 'rooms/ac/+'

WHERE temperature > 95 ",

"description": "Notify when temp exceeds 95",

"actions" : [What goes here?],

"ruleDisabled" : false

}

}

1) Set up Rule to react to relevant sensor data

AWS IoT Rules

• only have access to the current event

• cannot take contextual information into account

Consider passing all the data to the Action for evaluation.

2) Process the Data

What's the best way to

process this data?

Processing Options

AWS Lambda Amazon Kinesis Amazon SNS Amazon SQS

External

web service/

Webhooks

Worker

Processing Options

AWS Lambda

• Processes a single event at a time (no batching)

• Enrich data with context information from other sources

• Perform transformations

• Run any node.js / Java function

• No infrastructure to manage!

Processing Options

• Great for alerts: Sends push notifications, emails and SMS

• Call other systems via HTTP POST / webhooks

(on AWS or on-premises)

• SNS Topics support multiple subscribers, incl. AWS

Lambda and Amazon SQSAmazon SNS

Processing Options

• Great when events arrive with varying frequency

• Buffer data for asynchronous processing

• Ensure that no event data is lost

• SNS Topics support multiple subscribers, incl. AWS

Lambda and Amazon SQS

• Easily deploy SQS workers on AWS Elastic Beanstalk (or

Amazon EC2)

Amazon SQS

Processing Options

• Provides access to a "rolling window" of event data

• Scalable, can consume events from a multitude of different

rules / topics / devices

• Supports many independent, concurrent readers (&writers)

• Multiple processing options:Amazon Kinesis

KCL

application

AWS

Lambda

Processing Options

• Scalable way to connect many different systems to the

stream of events, e.g., custom KCL code, Complex Event

Processing (CEP) products

• Amazon Kinesis is a hub for all stream processing needs

Amazon Kinesis

Example:

1. Read last N events from stream

2. Determine maximum and rate of increase since beginning

3. Decide if alert should be sent

Amazon Kinesis

Recommendations

Only care about individual events?

Invoke an AWS Lambda Function via Rule / Action

For sliding window analysis and more flexibility

Stream into Kinesis and Run AWS Lambda function

Use Amazon Kinesis as a Hub for all incoming events.

3) Visualize the Current Metrics

• Managed Amazon Elasticsearch as a service

• Easy & fast indexing of data – well suited for lookups on

streaming data

• Easy to use visualization / dashboards using Kibana

Amazon

Elasticsearch

Service

DEMO TIME!

Predictions & Smart Applications

Machine learning and smart devices

Machine learning is the technology that

automatically finds patterns in your data and

uses them to make predictions for new data

points as they become available

Machine learning and smart devices

Machine learning is the technology that

automatically finds patterns in your data and

uses them to make predictions for new data

points as they become available

Your devices + machine learning = smart devices

IoT Use Cases for Machine Learning

- Find potential problems by looking for patterns

- Identify engines that are about to break down

- Predict when supplies will run out

- Spot sensors that report implausible data

- Predict next movement / direction of a connected vehicle

- Based on driving parameters & observations from other cars

- Predict traffic jams before they occur

Amazon Machine Learning

Amazon

Machine Learning

• Real-time predictions (and batch)

• Training & evaluation of machine learning models

• Picks the right model & parameters, helps build training

data

Basic Approach

1. Collect / build training data

- Take past data for sensor readings (temperature, humidity,

pressure) –not the deviceID or timestamp– as input

- Target: we define which readings are 'correct' or incorrect and

add the target variable's value to the training data.

Amazon S3 Amazon Redshift

Basic Approach

2. Train a Machine Learning Model

Amazon

Machine Learning

Basic Approach

3. Create a real-time prediction endpoint for the model

Amazon

Machine Learning

Basic Approach

4. Get predictions for events as they come in

Amazon

Machine LearningAmazon KinesisAmazon IoT AWS Lambda

Prediction

Basic Approach

1. Collect / build training data

- Determine input variables & target

- Evaluate the data to pick the target value for each set of

inputs in the data

2. Train a Machine Learning Model

- Builds a model based on the information in the training data

3. Create a real-time prediction endpoint for the model

- Outputs a prediction based on the input variables provided

4. Get predictions for events as they come in

Example Use Case: Filter out bad readings

1. Create a training data set based on past data & human

evaluation of the data

i.e., manually review the data and mark incorrect values

2. Train a Amazon ML model on this data to predict which

combinations are (in)correct

3. Invoke ML model on incoming data to predict

correctness

4. Alert staff via Amazon SNS push notification

DEMO TIME!

Lambda Function

public String handleRequest(String input, Context context) {

// Create AML client and cache endpoint

client = new AmazonMachineLearningClient(credentials);

// look up and cache the realtime endpoint for ML model

getRealtimeEndpoint();

PredictRequest request = new PredictRequest();

request.setMLModelId(mlModelId);

request.setPredictEndpoint(endpoint);

Lambda Function (continued)

// Populate record with relevant data

request.setRecord(jsonToMap(input));

PredictResult result = client.predict(request);

String label = result.getPrediction().getPredictedLabel();

Float prob = result.getPrediction()

.getPredictedScores().get(label) * 100;

Lambda Function (continued)

String outputString = "Device is performing "

+ label + " with a probability of " + prob + " %";

//publish to an SNS topic

PublishRequest publishRequest = new

PublishRequest(snsTopic, outputString);

PublishResult publishResult =

snsClient.publish(publishRequest);

return output.toString();

}

Recommendations

Rely on past data / context rather than defining 'rules'

Use Amazon Machine Learning for an easy start

Let real-time predictions drive reaction to patterns in

events

Conclusion & Outlook

What Have We Built?

Amazon

Machine Learning

Amazon Kinesis

Amazon IoT

AWS Lambda

Amazon Kinesis

FirehoseAmazon

Redshift

Amazon

Elasticsearch

Service

AWS Lambda

Outlook: Where Do We Go From Here?

- Automated reactions to events: feeding back into the

system, i.e., enrich data based on correlated data,

predictions and past data, then react on predictions

- Complex Event Processing (CEP)

- Unsupervised learning…?

Related Sessions

MBL203 State of the Union – San Polo 3501B 11:00 AM

MBL203 Everything about AWS IoT – Venetian H 12:15 PM

MBL311 AWS IoT Security - Palazzo A 1:30 PM

MBL312 Rules and Shadow - Palazzo A 2:45 PM

MBL313 Devices SDK and Kits - Palazzo A 4:15 PM

MBL303 Mobile Devices and IoT - Delfino 4005 4:15 PM

MBL203 Devices in Motion - Delfino 4005 Friday 10:15 AM

MBL305 IoT Data and Analytics - Delfino 4005 Friday 11:30

Thank you!

Remember to complete

your evaluations!