what's new in toolkits for ibm streams v4.1

19
© 2015 IBM Corporation What’s new in Toolkits IBM Streams 4.1 Ankit Pasricha Toolkits Team Lead [email protected]

Upload: lisanl

Post on 09-Apr-2017

264 views

Category:

Data & Analytics


1 download

TRANSCRIPT

© 2015 IBM Corporation

What’s new in Toolkits

IBM Streams 4.1

Ankit Pasricha

Toolkits Team Lead

[email protected]

2 © 2015 IBM Corporation

Important Disclaimer

THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONALPURPOSES ONLY.

WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THEINFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS”, WITHOUT WARRANTYOF ANY KIND, EXPRESS OR IMPLIED.

IN ADDITION, THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY,WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE.

IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OROTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.

NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF:

• CREATING ANY WARRANTY OR REPRESENTATION FROM IBM (OR ITS AFFILIATES OR ITS ORTHEIR SUPPLIERS AND/OR LICENSORS); OR

• ALTERING THE TERMS AND CONDITIONS OF THE APPLICABLE LICENSE AGREEMENTGOVERNING THE USE OF IBM SOFTWARE.

IBM’s statements regarding its plans, directions, and intent are subject to change orwithdrawal without notice at IBM’s sole discretion. Information regarding potentialfuture products is intended to outline our general product direction and it should notbe relied on in making a purchasing decision. The information mentioned regardingpotential future products is not a commitment, promise, or legal obligation to deliverany material, code or functionality. Information about potential future products maynot be incorporated into any contract. The development, release, and timing of anyfuture features or functionality described for our products remains at our solediscretion.

THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE.

IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.

3 © 2015 IBM Corporation

Agenda

(New) Spark MLLib Toolkit

(New) Cybersecurity Toolkit

(New) Distributed Process Store (DPS) Toolkit

Messaging Toolkit

Geospatial Toolkit

Text Toolkit

Other updates

4 © 2015 IBM Corporation

Combines the power of Spark MLLib and real-time streaming capabilities of

Streams

Allows scoring of real-time streaming data using Spark models

Github project

http://ibmstreams.github.io/streamsx.sparkMLLib/

Support for a number of MLLib models• Classification

• Linear SVM

• Naive Bayes

• Clustering

• KMeans

• Collaborative Filtering

• Regression

• Isotonic

• Linear

• Logistic

• Tree

• Decision Tree

• Gradient Boosted Trees

• Random Forest

Spark MLLib Toolkit

5 © 2015 IBM Corporation

Streams + Spark Demo

Incidents

Calls for Service

(911, etc)

311

Code Violations

Permits

Buildings Apache Spark

MLlib

hdfs

Historical City Data Sets

Model :

Is this call for

service a

false alarm?

Real-time

Calls for Service

Real-time

Predictions &

Relevant Context

IBM

Streams

Real-time

Dashboard

6 © 2015 IBM Corporation

Resources

Getting Started Guide:

https://developer.ibm.com/streamsdev/docs/getting-started-with-the-

spark-mllib-toolkit/

Documentation:

http://ibmstreams.github.io/streamsx.sparkMLLib/com.ibm.streamsx.

sparkmllib/doc/spldoc/html/

MLLib Guide: https://spark.apache.org/docs/latest/mllib-guide.html

Samples:

https://github.com/IBMStreams/streamsx.sparkMLLib/tree/master/sa

mples

7 © 2015 IBM Corporation

Cybersecurity Toolkit

The toolkit can detect active threats occurring within a network in

real-time.

Contains 3 machine-learning cybersecurity models:

DomainProfiling: Capable of analyzing DNS response records and reporting

on whether any domains are behaving suspiciously

HostProfiling: Capable of analyzing DNS response records and reporting if

individual hosts are behaving suspiciously

PredictiveBlacklisting: Capable of analyzing DNS response records and

predicting if a domain should be added to an internal blacklist

8 © 2015 IBM Corporation

Resources

Introduction to Cybersecurity toolkit:

https://developer.ibm.com/streamsdev/docs/detect-active-threats-in-

real-time-streams-cybersecurity-toolkit/

Getting Started Guide:

http://ibmstreams.github.io/streamsx.documentation/docs/4.1/cybers

ecurity/cybersecurity-getting-started/

Documentation: http://www-

01.ibm.com/support/knowledgecenter/SSCRJU_4.1.0/com.ibm.strea

ms.toolkits.doc/spldoc/dita/tk$com.ibm.streams.cybersecurity/tk$co

m.ibm.streams.cybersecurity.html?lang=en

Starter Apps:

https://github.com/IBMStreams/streamsx.cybersecurity.starterApps

9 © 2015 IBM Corporation

Distributed Process Store (DPS) Toolkit

Allows sharing of data across operators, Streams applications and

Streams and other applications.– Provides a collection of APIs in Java, C++ and SPL to read/write from redis

– Support for Redis 2.8.x and 3.0

Java Example: Creating a distributed store

10 © 2015 IBM Corporation

Distributed Process Store (DPS) Toolkit

Java Example: Acquiring a distributed lock

Java Example: Writing data

11 © 2015 IBM Corporation

Resources

Documentation: https://www-

01.ibm.com/support/knowledgecenter/SSCRJU_4.1.0/com.ibm.strea

ms.toolkits.doc/spldoc/dita/tk$com.ibm.streamsx.dps/tk$com.ibm.st

reamsx.dps.html?lang=en

Samples:

https://github.com/IBMStreams/streamsx.dps/tree/master/com.ibm.st

reamsx.dps/samples

12 © 2015 IBM Corporation

Messaging Toolkit updates

Guaranteed Processing Support KafkaConsumer

On checkpoint: save the offset within the message log

On reset: Replay messages from offset

JMSSource Runs in a transacted session with MQ running in persistent mode

On checkpoint: Acknowledge read messages so they are removed from the queue

On reset: Start replaying any unacknowledged messages

Performance improvements in Kafka operators (Pre-release) Using new KafkaProducer API

Developed in github: https://github.com/IBMStreams/streamsx.messaging

RabbitMQ support (Pre-release)

Kafka 0.9 and Message Hub support on Bluemix

13 © 2015 IBM Corporation

Geospatial Toolkit Update

New PointMapMatcher operator

We have a map, and a set of imprecise points

coming from a GPS or some other source in real

time

The data may only have a certain inherent

precision

There may be errors due signal noise

The map itself may be imprecise or

incorrect

We want to clean and smooth this data one

point at a time to lock the incoming points to the

road network.

“Where is this entity right now?”

14 © 2015 IBM Corporation

Operator Details

14

PointMapMatcher

Entity Locations

Map Geometry Updates

Matches

Errors

15 © 2015 IBM Corporation15

Some use cases:

• Routing

• Traffic reports

• Transit scheduling

• Taxi/emergency dispatching

• Streams Dev article:

https://developer.ibm.com/streamsdev/do

cs/realtime-map-matching-in-streams-v4-

0-1/

16 © 2015 IBM Corporation

Text Toolkit update

Added support for AQLs generated from BI 4.0+ web tooling

2 Step process

– Step 1: Create an extractor in BI web tool

17 © 2015 IBM Corporation

Text Toolkit update

Step 2: Load the extractor in the TextExtract operator for execution

stream<DataToAnalyze, ReferencesFound> TextExtractOutput =

TextExtract(InputFromSocialMedia)

{

param

moduleSearchPath : "etc/extractor" ;

inputDoc : "text" ;

outputViews : "ProductSearch" ;

outputMode : "multiPort" ;

}

For more information: http://www-

01.ibm.com/support/knowledgecenter/SSCRJU_4.1.0/com.ibm.stream

s.toolkits.doc/spldoc/dita/tk$com.ibm.streams.text/tk$com.ibm.strea

ms.text.html?lang=en

18 © 2015 IBM Corporation

Other updates

Bluemix support HDFS Toolkit for Bluemix

Hbase Toolkit for Bluemix

More information: https://developer.ibm.com/streamsdev/docs/integrating-

streams-biginsights-hbase-service-bluemix/

Data Governance support HDFS Toolkit

DB Toolkit

Inet Toolkit

Messaging Toolkit

Webcast Replay: https://developer.ibm.com/streamsdev/docs/streams-v-4-1-0-

developer-conference-replay/

Support for BI 4.1 HDFS Toolkit

Hbase Toolkit

19 © 2015 IBM Corporation

Questions?