akuda labs: pulsar

47
AKUDA LABS PROPRIETARY AND CONFIDENTIAL

Upload: akuda-labs

Post on 14-Apr-2017

598 views

Category:

Data & Analytics


1 download

TRANSCRIPT

AKUDA LABS PROPRIETARY AND CONFIDENTIAL

AKUDA LABS PROPRIETARY AND CONFIDENTIAL

Our Superpower �

100000#

10000#

250#

0# 10000# 20000# 30000# 40000# 50000# 60000# 70000# 80000# 90000# 100000#

Bananas#

Spark#Streaming#(latency#<#3s)#

Spark#Streaming#(op@mized#for#latency)#

Throughput)(packets/s))

10x �Throughput!

24,000x �Lower Latency!

400x �Throughput !

when Spark Streaming optimized for latency!

AKUDA LABS PROPRIETARY AND CONFIDENTIAL

The Benchmark:�Pattern Detection in Unstructured Streaming Text Data �

Spark Streaming Setup#

Bananas Setup#

Text Stream Generator!

Throughput Regulator#

Throughput Regulator#

AKUDA LABS PROPRIETARY AND CONFIDENTIAL

The Benchmark:�Platform and Setup�•  Dell 815 Servers!!•  48 Text Classification

Pipelines!!•  10 Gbit Connection!!Spark Streaming Configurations:!•  Receiver-Based Model!•  12 Kafka Topic

Partitions!•  Block Size: 200 ms!•  Batch Size: 1.5 – 20 s!

AKUDA LABS PROPRIETARY AND CONFIDENTIAL

Why does it matter? ��•  Reliability – May add 2 Nines!

•  Hardware Cost – Potentially 100x Less Cost In Hardware!

•  Energy – Potentially 100x Less Energy !

•  Data Center Footprint – Potentially 100x Less Racks !

•  Manageability – 10 machines versus 1000 machines!!

•  Network BW – Potentially 100x less network BW!

•  Total Cost of Ownership – Potentially < 1000x !!!!

•  Greater Peace of Mind!

AKUDA LABS PROPRIETARY AND CONFIDENTIAL

Who will pay for Real-Time Solutions? �Real-Time: Expected Latency < 1ms �•  Online Marketers!

– Process over 100k events per second for thousands of social media websites!

– Expected revenue > $2.1 Trillion!•  IoT Businesses!

– Process thousands of events per second from millions of connected devices!

– Expected revenue > $100 Billion!•  Spam and Fraud Detection!

– Detect multiple complex patterns in millions of transactions and documents per second!

– Expected revenue > $40 Billion!

AKUDA LABS PROPRIETARY AND CONFIDENTIAL

The Akuda Quest ��•  To enable truly-real time classification of

extremely high rate data streams !•  To enable subject matter experts who

possess extensive knowledge of the domain the data belongs to, and who are often non-programmers, to directly create classifiers!

•  To enable the fast development and refinement of data classifiers!

AKUDA LABS PROPRIETARY AND CONFIDENTIAL

The Real-Time Classification Challenge �Latency < 1ms �

Source

1,000,000 documents/second

1,024 bytes/packet

Ultra Fast Classification & Correlation

0.001 seconds (max latency)1,000,000

Distinct PossibleEvents/Trigger/Results

K1

K5

K4

K3

K2

K6

K7 K8

K9K10

ActionableInformation

Previous Knowledge Previous Knowledge Previous Knowledge

100 !events/s #

10,000 !Devices#

1,000,000 !packets/s #

10,000!Classifiers#

10 Billion Classification Operations/s#

AKUDA LABS PROPRIETARY AND CONFIDENTIAL

Pulsar Analyst Workbench�Quick, Intuitive Classifier Development Sandbox �

AKUDA LABS PROPRIETARY AND CONFIDENTIAL

Quick Model Optimization�Specialized Compiler, Data Analysis Tools �

RESOLVED FILTERING NETWORK

Optimizing Parallelizing Compiler

Cycle Detection Reordering DFA

PruningPlatformTargeting

TARGETPLATFORMTOPOLOGY

ExecutionEngine

AKUDA LABS PROPRIETARY AND CONFIDENTIAL

AKUDA Technology Delivery�

•  SaaS turn-key solution, with a model development system that allows for deployment of complete solutions in hours, without any coding requirements.!

•  Privately deployable enterprise solution on a Cloud Infrastructure. !

•  Software Development Infrastructure for developing highly specific and targeted solutions.!

AKUDA LABS PROPRIETARY AND CONFIDENTIAL

The SaaS Platform: Pulsar�High Level View �

INBOUNDDATAHUB

DATAAUGMENTATION

&CORRELATION

CLASSIFICATION

INDEXING

CLUSTERANALYSIS

OUTBOUNDDATAHUB

AKUDA LABS PROPRIETARY AND CONFIDENTIAL

Pulsar �System View �

Optimizing Parallelizing Compilerfor Classification, Analysis and Action

Network

LDACluster

Generator

LDA Cluster

Refinement

Massively Parallel RT Classification Engine

Social Media Data Sources

Universal Store

Social Media

Harvester

General Data

Integration Hub

Data Source

AkudaAgent

UniversalSearchable

Index

Data Source

DirectFeed

Author [G,A,E]

Image Analyzer(LGM)

Author Info Analyzer(LGM)

General Data Sources

Real-timeStream

Aggregator

RT Classification Pipeline

Author Geolocation

Analyzer(LGM)

Image Data

Sources

ImageHarvester

AuthorAttribute

Processor(LGM)

Real-timeStream

Correlator

Author Attribute Store

Image UniversalSearchable

IndexImage Store

Massively Parallel RT Classification Engine

AKUDABroadcaster

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

Author Atribute Detector

Author Attribute Detector

Author Attribute Detector

Author Attribute Detector

Author Attribute Detector

Author Attribute Detector

Author Attribute Detector

Author Attribute Detector

Author Attribute Detector

Author Attribute Detector

(LGM

)

Author Atribute Detector

Author Atribute Detector

Author Atribute Detector

LDAFeature

Generator(Proximity NGRAMS)

MISSION EDITOR

DFA

Tap DFA

Tap

Tap

DFA

DFAClassifier

Refinement

Pipeline Deep Inspection Store

Metrics And Alarms

RT StreamIndexer

Delivery IntegrationHub

TargetSystems

DashboardEditor

Visualization

RT DASHBOARD[Corona]

PIPELINE STUDIO[Pulsar]

DEEP INSPECTIONQuery UI

AUTHOR ATTRIBUTE

Query UI

UNIVERSAL STREAMQuery UI

LDA Classifier Generator

AKUDA LABS PROPRIETARY AND CONFIDENTIAL

Pulsar �Inbound Data Hub�

Optimizing Parallelizing Compilerfor Classification, Analysis and Action

Network

LDACluster

Generator

LDA Cluster

Refinement

Massively Parallel RT Classification Engine

Social Media Data Sources

Universal Store

Social Media

Harvester

General Data

Integration Hub

Data Source

AkudaAgent

UniversalSearchable

Index

Data Source

DirectFeed

Author [G,A,E]

Image Analyzer(LGM)

Author Info Analyzer(LGM)

General Data Sources

Real-timeStream

Aggregator

RT Classification Pipeline

Author Geolocation

Analyzer(LGM)

Image Data

Sources

ImageHarvester

AuthorAttribute

Processor(LGM)

Real-timeStream

Correlator

Author Attribute Store

Image UniversalSearchable

IndexImage Store

Massively Parallel RT Classification Engine

AKUDABroadcaster

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

Author Atribute Detector

Author Attribute Detector

Author Attribute Detector

Author Attribute Detector

Author Attribute Detector

Author Attribute Detector

Author Attribute Detector

Author Attribute Detector

Author Attribute Detector

Author Attribute Detector

(LGM

)

Author Atribute Detector

Author Atribute Detector

Author Atribute Detector

LDAFeature

Generator(Proximity NGRAMS)

MISSION EDITOR

DFA

Tap DFA

Tap

Tap

DFA

DFAClassifier

Refinement

Pipeline Deep Inspection Store

Metrics And Alarms

RT StreamIndexer

Delivery IntegrationHub

TargetSystems

DashboardEditor

Visualization

RT DASHBOARD[Corona]

PIPELINE STUDIO[Pulsar]

DEEP INSPECTIONQuery UI

AUTHOR ATTRIBUTE

Query UI

UNIVERSAL STREAMQuery UI

LDA Classifier Generator

AKUDA LABS PROPRIETARY AND CONFIDENTIAL

Pulsar �LGM: Data Augmentation and Correlation�

Optimizing Parallelizing Compilerfor Classification, Analysis and Action

Network

LDACluster

Generator

LDA Cluster

Refinement

Massively Parallel RT Classification Engine

Social Media Data Sources

Universal Store

Social Media

Harvester

General Data

Integration Hub

Data Source

AkudaAgent

UniversalSearchable

Index

Data Source

DirectFeed

Author [G,A,E]

Image Analyzer(LGM)

Author Info Analyzer(LGM)

General Data Sources

Real-timeStream

Aggregator

RT Classification Pipeline

Author Geolocation

Analyzer(LGM)

Image Data

Sources

ImageHarvester

AuthorAttribute

Processor(LGM)

Real-timeStream

Correlator

Author Attribute Store

Image UniversalSearchable

IndexImage Store

Massively Parallel RT Classification Engine

AKUDABroadcaster

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

Author Atribute Detector

Author Attribute Detector

Author Attribute Detector

Author Attribute Detector

Author Attribute Detector

Author Attribute Detector

Author Attribute Detector

Author Attribute Detector

Author Attribute Detector

Author Attribute Detector

(LGM

)

Author Atribute Detector

Author Atribute Detector

Author Atribute Detector

LDAFeature

Generator(Proximity NGRAMS)

MISSION EDITOR

DFA

Tap DFA

Tap

Tap

DFA

DFAClassifier

Refinement

Pipeline Deep Inspection Store

Metrics And Alarms

RT StreamIndexer

Delivery IntegrationHub

TargetSystems

DashboardEditor

Visualization

RT DASHBOARD[Corona]

PIPELINE STUDIO[Pulsar]

DEEP INSPECTIONQuery UI

AUTHOR ATTRIBUTE

Query UI

UNIVERSAL STREAMQuery UI

LDA Classifier Generator

AKUDA LABS PROPRIETARY AND CONFIDENTIAL

Pulsar �Bananas: Data Classification�

Optimizing Parallelizing Compilerfor Classification, Analysis and Action

Network

LDACluster

Generator

LDA Cluster

Refinement

Massively Parallel RT Classification Engine

Social Media Data Sources

Universal Store

Social Media

Harvester

General Data

Integration Hub

Data Source

AkudaAgent

UniversalSearchable

Index

Data Source

DirectFeed

Author [G,A,E]

Image Analyzer(LGM)

Author Info Analyzer(LGM)

General Data Sources

Real-timeStream

Aggregator

RT Classification Pipeline

Author Geolocation

Analyzer(LGM)

Image Data

Sources

ImageHarvester

AuthorAttribute

Processor(LGM)

Real-timeStream

Correlator

Author Attribute Store

Image UniversalSearchable

IndexImage Store

Massively Parallel RT Classification Engine

AKUDABroadcaster

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

Author Atribute Detector

Author Attribute Detector

Author Attribute Detector

Author Attribute Detector

Author Attribute Detector

Author Attribute Detector

Author Attribute Detector

Author Attribute Detector

Author Attribute Detector

Author Attribute Detector

(LGM

)

Author Atribute Detector

Author Atribute Detector

Author Atribute Detector

LDAFeature

Generator(Proximity NGRAMS)

MISSION EDITOR

DFA

Tap DFA

Tap

Tap

DFA

DFAClassifier

Refinement

Pipeline Deep Inspection Store

Metrics And Alarms

RT StreamIndexer

Delivery IntegrationHub

TargetSystems

DashboardEditor

Visualization

RT DASHBOARD[Corona]

PIPELINE STUDIO[Pulsar]

DEEP INSPECTIONQuery UI

AUTHOR ATTRIBUTE

Query UI

UNIVERSAL STREAMQuery UI

LDA Classifier Generator

AKUDA LABS PROPRIETARY AND CONFIDENTIAL

Pulsar �Corona: Cluster Analysis �

Optimizing Parallelizing Compilerfor Classification, Analysis and Action

Network

LDACluster

Generator

LDA Cluster

Refinement

Massively Parallel RT Classification Engine

Social Media Data Sources

Universal Store

Social Media

Harvester

General Data

Integration Hub

Data Source

AkudaAgent

UniversalSearchable

Index

Data Source

DirectFeed

Author [G,A,E]

Image Analyzer(LGM)

Author Info Analyzer(LGM)

General Data Sources

Real-timeStream

Aggregator

RT Classification Pipeline

Author Geolocation

Analyzer(LGM)

Image Data

Sources

ImageHarvester

AuthorAttribute

Processor(LGM)

Real-timeStream

Correlator

Author Attribute Store

Image UniversalSearchable

IndexImage Store

Massively Parallel RT Classification Engine

AKUDABroadcaster

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

Author Atribute Detector

Author Attribute Detector

Author Attribute Detector

Author Attribute Detector

Author Attribute Detector

Author Attribute Detector

Author Attribute Detector

Author Attribute Detector

Author Attribute Detector

Author Attribute Detector

(LGM

)

Author Atribute Detector

Author Atribute Detector

Author Atribute Detector

LDAFeature

Generator(Proximity NGRAMS)

MISSION EDITOR

DFA

Tap DFA

Tap

Tap

DFA

DFAClassifier

Refinement

Pipeline Deep Inspection Store

Metrics And Alarms

RT StreamIndexer

Delivery IntegrationHub

TargetSystems

DashboardEditor

Visualization

RT DASHBOARD[Corona]

PIPELINE STUDIO[Pulsar]

DEEP INSPECTIONQuery UI

AUTHOR ATTRIBUTE

Query UI

UNIVERSAL STREAMQuery UI

LDA Classifier Generator

AKUDA LABS PROPRIETARY AND CONFIDENTIAL

Pulsar �Outbound Data Hub�

Optimizing Parallelizing Compilerfor Classification, Analysis and Action

Network

LDACluster

Generator

LDA Cluster

Refinement

Massively Parallel RT Classification Engine

Social Media Data Sources

Universal Store

Social Media

Harvester

General Data

Integration Hub

Data Source

AkudaAgent

UniversalSearchable

Index

Data Source

DirectFeed

Author [G,A,E]

Image Analyzer(LGM)

Author Info Analyzer(LGM)

General Data Sources

Real-timeStream

Aggregator

RT Classification Pipeline

Author Geolocation

Analyzer(LGM)

Image Data

Sources

ImageHarvester

AuthorAttribute

Processor(LGM)

Real-timeStream

Correlator

Author Attribute Store

Image UniversalSearchable

IndexImage Store

Massively Parallel RT Classification Engine

AKUDABroadcaster

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

RT Classification Pipeline

Author Atribute Detector

Author Attribute Detector

Author Attribute Detector

Author Attribute Detector

Author Attribute Detector

Author Attribute Detector

Author Attribute Detector

Author Attribute Detector

Author Attribute Detector

Author Attribute Detector

(LGM

)

Author Atribute Detector

Author Atribute Detector

Author Atribute Detector

LDAFeature

Generator(Proximity NGRAMS)

MISSION EDITOR

DFA

Tap DFA

Tap

Tap

DFA

DFAClassifier

Refinement

Pipeline Deep Inspection Store

Metrics And Alarms

RT StreamIndexer

Delivery IntegrationHub

TargetSystems

DashboardEditor

Visualization

RT DASHBOARD[Corona]

PIPELINE STUDIO[Pulsar]

DEEP INSPECTIONQuery UI

AUTHOR ATTRIBUTE

Query UI

UNIVERSAL STREAMQuery UI

LDA Classifier Generator

AKUDA LABS PROPRIETARY AND CONFIDENTIAL

THE AKUDA CORE!

MASSIVELY PARALLEL STREAMING CLASSIFICATION INFRASTRUCTURE!

AKUDA LABS PROPRIETARY AND CONFIDENTIAL

Possible Solution 1�NOT THIS - GTS: Scalability & Latency Problems �

Feed BC

Rx

Rx

Rx

Rx

Indexer

Broadcaster

GTS Indexing System

Query With Frequency 2 q/s

Indexer

Indexer

Indexer

Query With Frequency 2 q/s Query With Frequency 2 q/s Query With Frequency 2 q/s Query With Frequency 2 q/s Query With Frequency 2 q/s Query With Frequency 2 q/s Query With Frequency 2 q/s Query With Frequency 2 q/s Query With Frequency 2 q/s Query With Frequency 2 q/s Query With Frequency 2 q/s Query With Frequency 2 q/s Query With Frequency 2 q/s Query With Frequency 2 q/s Query With Frequency 2 q/s Query With Frequency 2 q/s Query With Frequency 2 q/s Query With Frequency 2 q/s Query With Frequency 2 q/s Query With Frequency 2 q/s Query With Frequency 2 q/s Query With Frequency 2 q/s Query With Frequency 2 q/s Query With Frequency 2 q/s Query With Frequency 2 q/s Query With Frequency 2 q/s Query With Frequency 2 q/s Query With Frequency 2 q/s Query With Frequency 2 q/s

Query With Frequency 2 q/s Query With Frequency 2 q/s Query With Frequency 2 q/s

AnalyticsVisualization

AnalyticsVisualization

AnalyticsVisualization

AnalyticsVisualization

AnalyticsVisualization

AnalyticsVisualization

AnalyticsVisualization

AnalyticsVisualization

AnalyticsVisualization

AnalyticsVisualization

AnalyticsVisualization

AnalyticsVisualization

AnalyticsVisualization

AnalyticsVisualization

AnalyticsVisualization

Index Storage

AKUDA LABS PROPRIETARY AND CONFIDENTIAL

Possible Solution 2�NOT THIS - HADOOP: Latency Problems �

Feed BC

Brodcaster

HADOOP

Broadcaster#

AKUDA LABS PROPRIETARY AND CONFIDENTIAL

Possible Solution 3�Not Quite There: Spark Streaming Pipeline of RDDs �

Source

1,000,000 documents/second

1,024 bytes/packet

MicroBatcher

1,000,000 Sequential Stages

Doc 01

Doc 02

Doc 03

Doc 04

Doc 05

Doc 06

Doc 07

Doc 08

Doc 09

Doc 10

Doc 11

Doc 12

Doc 13

Doc 14

Doc 15

Doc 16

Latency of minutes, hours??

Network Transfers and/or Data Copying Across Host Nodes or

Pipeline Stages

AKUDA LABS PROPRIETARY AND CONFIDENTIAL

Possible Solution 4�Almost There: Data Flow Pipelines, Data Replication�

Source

1,000,000 documents/second

1,000 bytes/packet

Broadcaster

Bisection Bandwidth1,000,000,000,000,000 bytes/second

~10,000,000 GBits/second~10,000 TBits/second !!!

Doc 01

Doc 01

Doc 01

Doc 01

Doc 01

Doc 01

Doc 01

Doc 01

Doc 01

Doc 01

Doc 01

Doc 01

Doc 01

Doc 01

Doc 01

Doc 01

Doc 01

Doc 01

Doc 01

1,000,000 Stages Running Simultaneously

Low Lat

Broadcasting Doc Replicas becomes extreme bottleneck

PCIe 3.0 lane BW: ~ 1GByte/second

10Gbps Ethernet: ~ 1GB/second

Infiniband: Mellanox 56Gb/s FDR IB:

6.8GB/s

Cisco Catalyst 2960G-49TC-L Switching

Fabric: 40mpps. At 1000 bytes/

packet: 40,000 MBytes/second

==> 40 GBytes/second

Intel-Xeon-Processor-E7-8890 (15 cores)

Max Mem BW:85GBytes/second

AKUDA LABS PROPRIETARY AND CONFIDENTIAL

Possible Solution 5�Cost & Latency Issues: Data Broadcasting Tree �

Feed BC

Rx

Rx

Rx

Rx

Broadcaster

Rx

Rx

Rx

Rx

Rx

Rx

Rx

Rx

Rx

Rx

Rx

Rx

110

1010

Rx

Rx

Rx

Indexing /

Analytics

VisualizationIndexing /

Analytics

VisualizationIndexing /

Analytics

VisualizationIndexing /

Analytics

VisualizationIndexing /

Analytics

VisualizationIndexing /

Analytics

VisualizationIndexing /

Analytics

VisualizationIndexing /

Analytics

VisualizationIndexing /

Analytics

Visualization

Rx

Rx

Rx

Rx

Rx

Rx

Rx

Rx

Rx

Rx

Rx

Rx

Indexing /

Analytics

VisualizationIndexing /

Analytics

VisualizationIndexing /

Analytics

VisualizationIndexing /

Analytics

VisualizationIndexing /

Analytics

VisualizationIndexing /

Analytics

VisualizationIndexing /

Analytics

VisualizationIndexing /

Analytics

VisualizationIndexing /

Analytics

Visualization

Rx

Rx

Rx

Rx

Rx

Rx

1010

10

1 + 10 x 10 x 10 x 10 x 10 x 10 = 1,000,001 NodesWorst Case Cost = 1,000,001 * $1000/month:

~ $ 1 Billion / Month !!!!

Latency Goes back to hours or days!

AKUDA LABS PROPRIETARY AND CONFIDENTIAL

Honey, I Shrunk the Trees! �AKUDA Core Topology �

Indexing / Analytics

Feed

Rx TxVisualization

Broadcaster

Indexing / AnalyticsRx Tx

Visualization

Indexing / AnalyticsRx Tx

Visualization

Indexing / AnalyticsRx Tx

Visualization

Indexing / AnalyticsRx Tx

Visualization

Indexing / AnalyticsRx Tx

Visualization

Indexing / AnalyticsRx Tx

Visualization

Indexing / AnalyticsRx Tx

Visualization

Indexing / AnalyticsRx Tx

Visualization

Indexing / AnalyticsRx Tx

Visualization

Indexing / AnalyticsRx Tx

Visualization

Indexing / AnalyticsRx Tx

Visualization

Indexing / AnalyticsRx Tx

Visualization

Indexing / AnalyticsRx Tx

Visualization

Indexing / AnalyticsRx Tx

Visualization

Indexing / AnalyticsRx Tx

Visualization

Indexing / AnalyticsRx Tx

Visualization

Indexing / AnalyticsRx Tx

Visualization

Indexing / AnalyticsRx Tx

Visualization

Indexing / AnalyticsRx

Model Pipelines

TxVisualization

1,000,000 documents/second

1,000 bytes/packet

100,000 Short Pipelines * 10 Stages each= 1,000,000 Stages

Akuda Queue Technologyusing on-chip inter-core

networks

Akuda Buffer Technologyusing on-chip inter-core

networks

Akuda Correlator Technologyusing on-chip inter-core

networks

0.001 seconds typical latency

AKUDA LABS PROPRIETARY AND CONFIDENTIAL

The Solution �Utilize Inter-Core Communication Channel��

Data Communication Hardware! Typical Bandwidth! Typical Cost!

10 Gbps Ethernet! 1 GB/s! $ 1,000!

PCIe 3.0 Lane! 1 GB/s! $10,000!

Infiniband, Mellanox 56Gb/s FDR IB! 6.8 GB/s! $1,000,000!

Cisco Catalyst Switching Fabric! 40 GB/s! $10,000,000!

Inter-core/Inter-processor Fabric Bisection Bandwidth!

1000 GB/s !(for IA64 Chips)! $500!

AKUDA LABS PROPRIETARY AND CONFIDENTIAL

Data Broadcasting�Use The Best Broadcasting Network�� 340GB/s

> 1000GB/s

AKUDA LABS PROPRIETARY AND CONFIDENTIAL

The Solution �AKUDA Core Differentiating Factors ��

Lockfree Queue, Pipeline Control!Lockfree Correlator!Lockfree Multithreaded Processing!

Feed BC

Broadcaster

Indexing / AnalyticsRx Tx

Visualization

1

101000

AkudaCore

Indexing / AnalyticsRx Tx

Visualization

Indexing / AnalyticsRx Tx

Visualization

Indexing / AnalyticsRx Tx

Visualization

Indexing / AnalyticsRx Tx

Visualization

Indexing / AnalyticsRx Tx

Visualization

Indexing / AnalyticsRx Tx

Visualization

Indexing / AnalyticsRx Tx

Visualization

Indexing / AnalyticsRx Tx

Visualization

AkudaCore

Indexing / AnalyticsRx Tx

Visualization

Indexing / AnalyticsRx Tx

Visualization

Indexing / AnalyticsRx Tx

Visualization

Indexing / AnalyticsRx Tx

Visualization

Indexing / AnalyticsRx Tx

Visualization

Indexing / AnalyticsRx Tx

Visualization

Indexing / AnalyticsRx Tx

Visualization

Indexing / AnalyticsRx Tx

Visualization

AkudaCore

AkudaCore

AkudaCore

Zero-replication Data Broadcasting!

On-chip-network Communication Control!

AKUDA LABS PROPRIETARY AND CONFIDENTIAL

Adaptive Topology�Continuous Optimization of Data Comm & Pipeline Execution ��

AKUDA LABS PROPRIETARY AND CONFIDENTIAL

Akuda Core Scalability�� Bisection Bandwidth

10 20 30 40 50 60 70 80

80

70

60

50

40

30

20

10

BBW

Processors

Akuda Lock-freeAlgorithms

Standard Algorithms

Processing Latency

10 20 30 40 50 60 70 80

80

70

60

50

40

30

20

10

Time

Processors

Akuda Lock-freeAlgorithms

Standard Algorithms

Processing Cost

200 400 600 800 1000 1200 1400 1600

800

700

600

500

400

300

200

100

1000 $ / Month

MILLION [Stream Rate * Pipelines * Patterns]

Akuda Lock-freeAlgorithms

Standard Algorithms

Parallelization Speedup

10 20 30 40 50 60 70 80

80

70

60

50

40

30

20

10

Speedup

Processors

Akuda Lock-freeAlgorithms

Standard Algorithms

AKUDA LABS PROPRIETARY AND CONFIDENTIAL

AKUDA Core in Action �Election2016.io: Real-Time Online Polls �

“The problem is that when polls are wrong, they tend to be wrong in the same direction. If they miss in New Hampshire, for instance, they all miss on the same mistake.” -- Nate Silver!

AKUDA LABS PROPRIETARY AND CONFIDENTIAL

Akuda Core in Action �Election2016.io Backend��

Feed

Indexing / AnalyticsRx

Model Pipelines

TxVisualization

50,000 documents/second (peak)

1,000 bytes/document

3000 Models (Author Classification + App Classification)

Akuda Broadcasting Technology

using on-chip inter-core fabric

Akuda Buffer Technologyusing on-chip inter-core

fabric

Akuda Correlator Technology

Sub-second Latency

150 GigaBytes Bisection Bandwidth(Over 1 TERAbit/second)

Indexing / AnalyticsRx Tx

Visualization

Indexing / AnalyticsRx Tx

Visualization

Indexing / AnalyticsRx Tx

Visualization

Indexing / AnalyticsRx Tx

Visualization

Indexing / AnalyticsRx Tx

Visualization

Indexing / AnalyticsRx Tx

Visualization

Indexing / AnalyticsRx Tx

Visualization

Indexing / AnalyticsRx Tx

Visualization

Indexing / AnalyticsRx Tx

Visualization

Indexing / AnalyticsRx Tx

Visualization

Indexing / AnalyticsRx Tx

Visualization

Indexing / AnalyticsRx Tx

Visualization

Indexing / AnalyticsRx Tx

Visualization

Indexing / AnalyticsRx Tx

Visualization

Indexing / AnalyticsRx Tx

Visualization

Learned PeopleAttributes

Akuda ClassificationTechnology

Correlator

Akuda Data AnalysisTechnology

100 Patterns/Model

15 BILLIONPatterns/second

AKUDA LABS PROPRIETARY AND CONFIDENTIAL

STANDALONE#USE#OF#BANANAS##

AKUDA LABS PROPRIETARY AND CONFIDENTIAL

General Statistical Classification�K-MEANS, LDA, NN �

Feed

Rx Tx

Broadcaster

Rx Tx

Rx Tx

Rx Tx

Rx Tx

Rx Tx

Rx Tx

Rx Tx

Rx Tx

Rx Tx

Rx Tx

Rx Tx

Rx Tx

Rx Tx

Rx Tx

Rx Tx

Rx Tx

Rx Tx

Rx Tx

Rx

k-means Model SubMatrix

Tx

1,000,000 documents/second

1,000 bytes/packet

10,000 Nodes * 100 k-means Centroid Vectors

Akuda Queue Technologyusing on-chip inter-core

networks

Akuda Buffer Technologyusing on-chip inter-core

networks

0.001 seconds typical latency

Aggregator

Akuda Queue Technologyusing on-chip inter-core

networks

k-meanscluster label

for data item

Akuda LocklessMatrix Ops

Akuda LocklessCorrelator

AKUDA LABS PROPRIETARY AND CONFIDENTIAL

IOT Classification POC �K-MEANS �

Feed

Rx Tx

Broadcaster

Rx Tx

Rx Tx

Rx Tx

Rx Tx

Rx Tx

Rx Tx

Rx Tx

Rx Tx

Rx Tx

Rx Tx

Rx Tx

Rx Tx

Rx Tx

Rx Tx

Rx Tx

Rx Tx

Rx Tx

Rx Tx

Rx Tx

1,000,000 sensor-vectors/second

1,000 bytes/vector

Classification Using DFA, K-MEANS, LDA, or NN Models

Akuda Queue Technologyusing on-chip inter-core

networks

Akuda Buffer Technologyusing on-chip inter-core

networks

0.001 seconds typical latency

Aggregator

Akuda Queue Technologyusing on-chip inter-core

networks

Sensor Warnings

Akuda LocklessMatrix Ops

Sensor State Classification

Akuda LocklessCorrelator

AKUDA LABS PROPRIETARY AND CONFIDENTIAL

IOT Classification POC �K-MEANS �

LINEAR ALGEBRA

ENGINE - 1

LINEAR ALGEBRA

ENGINE - 2

LINEAR ALGEBRA

ENGINE - N

LINEAR ALGEBRA

ENGINE - 100

DATA RECEIVER

INPUT DATA CHANNEL

L2 NORM CHANNEL

AGGREGATOR

LOCKLESS HASH

UNSORTED CHANNEL

MIN FINDERINPUT DATA

STREAMOUTPUT DATA

STREAM

Packet ID

Input Packet: D

Packet ID

Transformed Packet: D’

Packet ID

Minimum Elements Vector

Minimum Distance from Classifier: Pn

Packet ID

Classified Packet

For, K = 100,000 (number of clusters)       N = 100 (number of processors)       P = 1000 (cardinality of feature set)       D : Input Vector to be classified       A : Model matrix representing trained values for classification centroids

AKUDA LABS PROPRIETARY AND CONFIDENTIAL

AKUDA#LABS#PATENTS#Pending#&#Provisional#

AKUDA LABS PROPRIETARY AND CONFIDENTIAL

PATENT LIST (1/3)#1 HIERARCHICAL, PARALLEL MODELS FOR EXTRACTING IN REAL TIME HIGH-VALUE INFORMATION FROM DATA STREAMS AND SYSTEM AND METHOD FOR

CREATION OF SAME

2 HIERARCHICAL, PARALLEL MODELS FOR EXTRACTING IN REAL-TIME HIGH-VALUE INFORMATION FROM DATA STREAMS AND SYSTEM AND METHOD FOR CREATION OF SAME

3 MASSIVELY-PARALLEL SYSTEM ARCHITECTURE AND METHOD FOR REAL-TIME EXTRACTION OF HIGH-VALUE INFORMATION FROM DATA STREAMS

4 OPTIMIZATION FOR REAL-TIME, PARALLEL EXECUTION OF MODELS FOR EXTRACTING HIGH-VALUE INFORMATION FROM DATA STREAMS

5 EXTRACTION OF HIGH VALUE INFORMATION FROM UNSTRUCTURED IMAGES IN MASSIVELY PARALLEL PROCESSING SYSTEM

6 REAL-TIME MASSIVELY PARALLEL PIPELINE PROCESSING SYSTEM

7 ADDITIONAL APPLICATIONS DIRECTED TO SPECIFIC ASPECTS/IMPROVEMENTS OF REAL-TIME MASSIVELY PARALLEL PIPELINE PROCESSING SYSTEM

8 AUTOMATIC TOPIC DISCOVERY IN STREAMS OF SOCIAL MEDIA POSTS

9 TOPIC AND TREND DISCOVERY WITHIN REAL-TIME ONLINE CONTENT STREAMS

10 SYSTEM AND METHOD FOR IMPLEMENTING ENTERPRISE RISK MODELS BASED ON INFORMATION POSTS

11 ADDITIONAL APPLICATIONS DIRECTED TO SPECIFIC MODELS OTHER THAN RISK MODELS

12 LAZY PARSER FOR INFERENCE IN UNSTRUCTURED DATA STREAMS

13 REALTIME DATA STREAM CLUSTER SUMMARIZATION AND LABELING SYSTEM

14 DATA BROADCASTING TECHNOLOGY FOR REAL TIME ANALYTICS FROM UNSTRUCTURED DATA

15 REAL-TIME STREAM CORRELATION WITH PRE-EXISTING KNOWLEDGE (STATE)

16 LOCKLESS KEY-VALUE STORE AND MEMORY CACHING SYSTEM

17 DYNAMIC RESOURCE ALLOCATOR FOR REAL-TIME PARALLEL PIPELINE PROCESSING SYSTEM

AKUDA LABS PROPRIETARY AND CONFIDENTIAL

PATENT LIST (2/3)#18

REALTIME LOW LATENCY DATA STREAM DFA CLASSIFICATION ENGINE

19 PARALLEL PROCESSING ARCHITECTURE AND DATA BROADCASTING TECHNOLOGY FOR SOCIAL MEDIA AUTHOR CLASSIFICATION AND ANALYSIS STREAM

20 ATTRIBUTE VECTOR COMPRESSION FOR STREAM PROCESSING

21 REATIME IOT PARALLEL VECTOR CLASSIFICATION

22 REALTIME IMAGE HARVESTING AND STORAGE SYSTEM

23 DATA STREAM HISTORIC REPLAY VERSIONING (SKYLINE)

24 DATA STREAM HISTORIC REPLAY SYSTEM AND STORAGE

25 EXTRACTION OF AUTHOR(PEOPLE) ATTRIBUTES THROUGH COMPLEX DFA MODELS

26 REALTIME IMAGE HARVESTING AND STORAGE SYSTEM

27 NEURAL NETWORK-BASED SYSTEM FOR EXTRACTION OF DEMOGRAPHICS FROM SOCIAL MEDIA IMAGES

28 METHODFORSOCIALMEDIAEVENTDETECTIONANDCAUSEANALYSIS

29 METHOD FOR REAL-TIME TAGGING OF DATA STREAM DOCUMENTS

30 PEOPLE ATTRIBUTE QUERY AND VISUALIZATION TOOL

31 WORD SET VISUAL NORMALIZED WEIGHT DAMPENING

32 PARALLEL PROCESSING ARCHITECTURE AND DATA BROADCASTING TECHNOLOGY FOR REAL TIME ANALYTICS FROM UNSTRUCTURED ELECTION DATA

33 PARALLEL PROCESSING ARCHITECTURE AND DATA BROADCASTING TECHNOLOGY FOR REAL TIME ANALYTICS FROM UNSTRUCTURED RETAIL DATA

AKUDA LABS PROPRIETARY AND CONFIDENTIAL

PATENT LIST (3/3)#34

SYSTEMS AND METHODS FOR ANALYZING UNSOLICITED PRODUCT/SERVICE CUSTOMER REVIEWS

35 SYSTEM FOR CREDIT/INSURANCE PROCESSING USING UNSTRUCTURED DATA

36 SYSTEM AND METHOD FOR CORRELATING SOCIAL MEDIA DATA AND COMPANY FINANCIAL DATA

37 SYSTEMS AND METHODS FOR IDENTIFYING AN ILLNESS AND COURSE OF TREATMENT FOR A PATIENT

38 SYSTEM AND METHOD FOR IDENTIFYING FACIAL EXPRESSIONS FROM SOCIAL MEDIA IMAGES

39 SYSTEM AND METHOD FOR DETECTING HEALTH MALADIES IN A PATIENT USING UNSTRUCTURED IMAGES

40 SYSTEM AND METHOD FOR DETECTING POLITICAL DESTABILIZATION AT A SPECIFIC GEOGRAPHIC LOCATION BASED ON SOCIAL MEDIA DATA

41 SYSTEM AND METHOD FOR IDENTIFYING CORRELATIONS BETWEEN SOCIAL MEDIA IMAGES USING NEURAL NETWORKS

42 SYSTEM AND METHOD FOR SCALABLE PROCESSING OF DATA PIPELINES USING A LOCKLESS SHARED MEMORY SYSTEM

43 ASYNCHRONOUS WEB PAGE DATA AGGREGATOR

44 APPLICATIONS OF DISTIBUTED PROCESSING AND DATA BROADCASTING TECHNOLOGY TO REAL TIME NEWS SERVICE

45 DISTRIBUTED PROCESSING AND DATA BROADCASTING TECHNOLOGY FOR REAL TIME THREAT ANALYSIS

46 DISTRIBUTED PROCESSING AND DATA BROADCASTING TECHNOLOGY FOR REAL TIME EMERGENCY RESPONSE

47 DISTRIBUTED PROCESSING AND DATA BROADCASTING TECHNOLOGY FOR CLIMATE ANALYTICS

48 DISTRIBUTED PROCESSING AND DATA BROADCASTING TECHNOLOGY FOR INSURANCE RISK ASSESSMENT

49 DISTRIBUTED PARALLEL ARCHITECTURES FOR REAL TIME PROCESSING OF STREAMS OF STRUCTURED AND UNSTRUCTURED DATA

AKUDA LABS PROPRIETARY AND CONFIDENTIAL

THE#AKUDA#SYSTEM#Addi@onal#Informa@on#

AKUDA LABS PROPRIETARY AND CONFIDENTIAL

The Solution �Akuda Core Topology with Kafka��

UU#OnUchipUnetwork#Comm#Control#UU#ZeroUcopy#Data#Broadcas@ng#UU#Lockfree#queue,#pipeline#control#UU#Lockfree#correlator#UU#Lockfree#Mul@threaded#Processing#

Feed BC

Kafka

Indexing / AnalyticsRx Tx

Visualization

1

101000

AkudaCore

Indexing / AnalyticsRx Tx

Visualization

Indexing / AnalyticsRx Tx

Visualization

Indexing / AnalyticsRx Tx

Visualization

Indexing / AnalyticsRx Tx

Visualization

Indexing / AnalyticsRx Tx

Visualization

Indexing / AnalyticsRx Tx

Visualization

Indexing / AnalyticsRx Tx

Visualization

Indexing / AnalyticsRx Tx

Visualization

AkudaCore

Indexing / AnalyticsRx Tx

Visualization

Indexing / AnalyticsRx Tx

Visualization

Indexing / AnalyticsRx Tx

Visualization

Indexing / AnalyticsRx Tx

Visualization

Indexing / AnalyticsRx Tx

Visualization

Indexing / AnalyticsRx Tx

Visualization

Indexing / AnalyticsRx Tx

Visualization

Indexing / AnalyticsRx Tx

Visualization

AkudaCore

AkudaCore

AkudaCore

AKUDA LABS PROPRIETARY AND CONFIDENTIAL

Pulsar �Functional View �

Unstructured Data Source

Streams

Unstructured Data Source

Batch

Unstructured Data Source

Images

MILLIONS OF DOCUMENTSPER SECOND

LDACONTROL

AKUDADEEP INSPECTION

THIRD-PARTYDATA ANALYTICS

HADOOPBASED ANALYTICS

THIRD-PARTYVISUALIZATION

AKUDADASHBOARD

RTContent

Classification(DFA/LDA/VEC)

RTAuthor

Classification(DFA/LDA)

Optimizing Parallelizing Compiler

Normalization RT

AuthorImage Analysis

(NEURAL NETS)

UniversalIndexing

P-GRAM GEN

Indexer

STATS / ANALYTICS

Author ATTR

Author GEO

Author DEM

LDA PROC

P-GRAM GEN LDA PROC

10+ BILLIONS OF CLASSIFICATIONSPER SECOND

MISSIONEDITOR

AKUDA LABS PROPRIETARY AND CONFIDENTIAL

Automatic Cluster Discovery�P-GRAMS, LDA, CONVERGENCE �

Mission Deep Inspection Store

Summarizer

p-GRAMGenerator

Mission StreamConceptExtractor

LDASolver

ConvergenceMonitor

p-GRAMS

CorpusSummary

CorpusConcept Cloud

LabeledCorpusClusters

ClassificationModelLibrary

LDACluster Generation & Labeling

LDA Cluster Refinement

DFAClassifier Refinement

LDA Classifier Generator

AKUDA LABS PROPRIETARY AND CONFIDENTIAL

Author Attribute Discovery�Neural Networks, Bayesian Models, DFAs �

EthnicityImage Analyzer

Author Info Analyzer(LGM)

Real-timeStream

Aggregator

Author Geolocation

Analyzer(LGM)

AuthorAttribute

Processor(LGM)

Real-timeStream

Correlator

Massively Parallel RT Classification Engine

AKUDABroadcaster

Author Atribute Detector

Author Attribute Detector

Author Attribute Detector

Author Attribute Detector

Author Attribute Detector

Author Attribute Detector

Author Attribute Detector

Author Attribute Detector

Author Attribute Detector

Author Attribute Detector

(LGM)

Author Atribute Detector

Author Atribute Detector

Author Atribute Detector

Unstructured Data Source

A

Unstructured Data Source

B

Unstructured Data Source

C

Normalization

AgeImage Analyzer

GenderImage Analyzer

LabeledImage

GeneratorNeural Network

Trainer

Author BayesianClassification Model Trainer

AKUDA LABS PROPRIETARY AND CONFIDENTIAL

Generalized Image Classification�Neural Networks, Bayesian Models, DFAs �

EthnicityImage Analyzer

AgeImage Analyzer

GenderImage Analyzer

LabeledImage

Generator

Neural NetworkTrainer

Image Data

Sources

ImageHarvester

LogoIdentification

FaceDetector Glasses

Image Analyzer

WeightImage Analyzer

Hair-styleImage Analyzer

ShapeIdentification

EmotionImage Analyzer

ImageLabel

Classifier

Image DB

AKUDA LABS PROPRIETARY AND CONFIDENTIAL

Pipeline Editor �Automatic LDA Models, User-specified DFAs �

RTContent

Classification(DFA/LDA/VEC)

Optimizing Parallelizing Compiler

PIPELINE EDITORFiltering, Analysis And Action Network

LDAClassifier

VectorStringCMP

VectorINT/FP

CMP

DFA

CounterTap

ActionBlock

DFA

CounterTap

CounterTap

DFA

ActionBlock

OutputInou

t

LDAClassifier

VectorStringCMP

VectorINT/FPCMPDFA Action

Block

CounterTap

Model LibraryAirlinesAutoAuto InsuranceCableBeveragesFast FoodFinanceHousingLegalPharma/Health

Most Used DetectorsTech

Advertisement

InquiryCustomer ServiceIrate CustomersThankful Customers

Consumers

STATE MANAGEMENT

P-GRAM GEN

Indexer

LDA PROC