akuda labs: pulsar
TRANSCRIPT
AKUDA LABS PROPRIETARY AND CONFIDENTIAL
Our Superpower �
100000#
10000#
250#
0# 10000# 20000# 30000# 40000# 50000# 60000# 70000# 80000# 90000# 100000#
Bananas#
Spark#Streaming#(latency#<#3s)#
Spark#Streaming#(op@mized#for#latency)#
Throughput)(packets/s))
10x �Throughput!
24,000x �Lower Latency!
400x �Throughput !
when Spark Streaming optimized for latency!
AKUDA LABS PROPRIETARY AND CONFIDENTIAL
The Benchmark:�Pattern Detection in Unstructured Streaming Text Data �
Spark Streaming Setup#
Bananas Setup#
Text Stream Generator!
Throughput Regulator#
Throughput Regulator#
AKUDA LABS PROPRIETARY AND CONFIDENTIAL
The Benchmark:�Platform and Setup�• Dell 815 Servers!!• 48 Text Classification
Pipelines!!• 10 Gbit Connection!!Spark Streaming Configurations:!• Receiver-Based Model!• 12 Kafka Topic
Partitions!• Block Size: 200 ms!• Batch Size: 1.5 – 20 s!
AKUDA LABS PROPRIETARY AND CONFIDENTIAL
Why does it matter? ��• Reliability – May add 2 Nines!
• Hardware Cost – Potentially 100x Less Cost In Hardware!
• Energy – Potentially 100x Less Energy !
• Data Center Footprint – Potentially 100x Less Racks !
• Manageability – 10 machines versus 1000 machines!!
• Network BW – Potentially 100x less network BW!
• Total Cost of Ownership – Potentially < 1000x !!!!
• Greater Peace of Mind!
AKUDA LABS PROPRIETARY AND CONFIDENTIAL
Who will pay for Real-Time Solutions? �Real-Time: Expected Latency < 1ms �• Online Marketers!
– Process over 100k events per second for thousands of social media websites!
– Expected revenue > $2.1 Trillion!• IoT Businesses!
– Process thousands of events per second from millions of connected devices!
– Expected revenue > $100 Billion!• Spam and Fraud Detection!
– Detect multiple complex patterns in millions of transactions and documents per second!
– Expected revenue > $40 Billion!
AKUDA LABS PROPRIETARY AND CONFIDENTIAL
The Akuda Quest ��• To enable truly-real time classification of
extremely high rate data streams !• To enable subject matter experts who
possess extensive knowledge of the domain the data belongs to, and who are often non-programmers, to directly create classifiers!
• To enable the fast development and refinement of data classifiers!
AKUDA LABS PROPRIETARY AND CONFIDENTIAL
The Real-Time Classification Challenge �Latency < 1ms �
Source
1,000,000 documents/second
1,024 bytes/packet
Ultra Fast Classification & Correlation
0.001 seconds (max latency)1,000,000
Distinct PossibleEvents/Trigger/Results
K1
K5
K4
K3
K2
K6
K7 K8
K9K10
ActionableInformation
Previous Knowledge Previous Knowledge Previous Knowledge
100 !events/s #
10,000 !Devices#
1,000,000 !packets/s #
10,000!Classifiers#
10 Billion Classification Operations/s#
AKUDA LABS PROPRIETARY AND CONFIDENTIAL
Pulsar Analyst Workbench�Quick, Intuitive Classifier Development Sandbox �
AKUDA LABS PROPRIETARY AND CONFIDENTIAL
Quick Model Optimization�Specialized Compiler, Data Analysis Tools �
RESOLVED FILTERING NETWORK
Optimizing Parallelizing Compiler
Cycle Detection Reordering DFA
PruningPlatformTargeting
TARGETPLATFORMTOPOLOGY
ExecutionEngine
AKUDA LABS PROPRIETARY AND CONFIDENTIAL
AKUDA Technology Delivery�
• SaaS turn-key solution, with a model development system that allows for deployment of complete solutions in hours, without any coding requirements.!
• Privately deployable enterprise solution on a Cloud Infrastructure. !
• Software Development Infrastructure for developing highly specific and targeted solutions.!
AKUDA LABS PROPRIETARY AND CONFIDENTIAL
The SaaS Platform: Pulsar�High Level View �
INBOUNDDATAHUB
DATAAUGMENTATION
&CORRELATION
CLASSIFICATION
INDEXING
CLUSTERANALYSIS
OUTBOUNDDATAHUB
AKUDA LABS PROPRIETARY AND CONFIDENTIAL
Pulsar �System View �
Optimizing Parallelizing Compilerfor Classification, Analysis and Action
Network
LDACluster
Generator
LDA Cluster
Refinement
Massively Parallel RT Classification Engine
Social Media Data Sources
Universal Store
Social Media
Harvester
General Data
Integration Hub
Data Source
AkudaAgent
UniversalSearchable
Index
Data Source
DirectFeed
Author [G,A,E]
Image Analyzer(LGM)
Author Info Analyzer(LGM)
General Data Sources
Real-timeStream
Aggregator
RT Classification Pipeline
Author Geolocation
Analyzer(LGM)
Image Data
Sources
ImageHarvester
AuthorAttribute
Processor(LGM)
Real-timeStream
Correlator
Author Attribute Store
Image UniversalSearchable
IndexImage Store
Massively Parallel RT Classification Engine
AKUDABroadcaster
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
Author Atribute Detector
Author Attribute Detector
Author Attribute Detector
Author Attribute Detector
Author Attribute Detector
Author Attribute Detector
Author Attribute Detector
Author Attribute Detector
Author Attribute Detector
Author Attribute Detector
(LGM
)
Author Atribute Detector
Author Atribute Detector
Author Atribute Detector
LDAFeature
Generator(Proximity NGRAMS)
MISSION EDITOR
DFA
Tap DFA
Tap
Tap
DFA
DFAClassifier
Refinement
Pipeline Deep Inspection Store
Metrics And Alarms
RT StreamIndexer
Delivery IntegrationHub
TargetSystems
DashboardEditor
Visualization
RT DASHBOARD[Corona]
PIPELINE STUDIO[Pulsar]
DEEP INSPECTIONQuery UI
AUTHOR ATTRIBUTE
Query UI
UNIVERSAL STREAMQuery UI
LDA Classifier Generator
AKUDA LABS PROPRIETARY AND CONFIDENTIAL
Pulsar �Inbound Data Hub�
Optimizing Parallelizing Compilerfor Classification, Analysis and Action
Network
LDACluster
Generator
LDA Cluster
Refinement
Massively Parallel RT Classification Engine
Social Media Data Sources
Universal Store
Social Media
Harvester
General Data
Integration Hub
Data Source
AkudaAgent
UniversalSearchable
Index
Data Source
DirectFeed
Author [G,A,E]
Image Analyzer(LGM)
Author Info Analyzer(LGM)
General Data Sources
Real-timeStream
Aggregator
RT Classification Pipeline
Author Geolocation
Analyzer(LGM)
Image Data
Sources
ImageHarvester
AuthorAttribute
Processor(LGM)
Real-timeStream
Correlator
Author Attribute Store
Image UniversalSearchable
IndexImage Store
Massively Parallel RT Classification Engine
AKUDABroadcaster
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
Author Atribute Detector
Author Attribute Detector
Author Attribute Detector
Author Attribute Detector
Author Attribute Detector
Author Attribute Detector
Author Attribute Detector
Author Attribute Detector
Author Attribute Detector
Author Attribute Detector
(LGM
)
Author Atribute Detector
Author Atribute Detector
Author Atribute Detector
LDAFeature
Generator(Proximity NGRAMS)
MISSION EDITOR
DFA
Tap DFA
Tap
Tap
DFA
DFAClassifier
Refinement
Pipeline Deep Inspection Store
Metrics And Alarms
RT StreamIndexer
Delivery IntegrationHub
TargetSystems
DashboardEditor
Visualization
RT DASHBOARD[Corona]
PIPELINE STUDIO[Pulsar]
DEEP INSPECTIONQuery UI
AUTHOR ATTRIBUTE
Query UI
UNIVERSAL STREAMQuery UI
LDA Classifier Generator
AKUDA LABS PROPRIETARY AND CONFIDENTIAL
Pulsar �LGM: Data Augmentation and Correlation�
Optimizing Parallelizing Compilerfor Classification, Analysis and Action
Network
LDACluster
Generator
LDA Cluster
Refinement
Massively Parallel RT Classification Engine
Social Media Data Sources
Universal Store
Social Media
Harvester
General Data
Integration Hub
Data Source
AkudaAgent
UniversalSearchable
Index
Data Source
DirectFeed
Author [G,A,E]
Image Analyzer(LGM)
Author Info Analyzer(LGM)
General Data Sources
Real-timeStream
Aggregator
RT Classification Pipeline
Author Geolocation
Analyzer(LGM)
Image Data
Sources
ImageHarvester
AuthorAttribute
Processor(LGM)
Real-timeStream
Correlator
Author Attribute Store
Image UniversalSearchable
IndexImage Store
Massively Parallel RT Classification Engine
AKUDABroadcaster
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
Author Atribute Detector
Author Attribute Detector
Author Attribute Detector
Author Attribute Detector
Author Attribute Detector
Author Attribute Detector
Author Attribute Detector
Author Attribute Detector
Author Attribute Detector
Author Attribute Detector
(LGM
)
Author Atribute Detector
Author Atribute Detector
Author Atribute Detector
LDAFeature
Generator(Proximity NGRAMS)
MISSION EDITOR
DFA
Tap DFA
Tap
Tap
DFA
DFAClassifier
Refinement
Pipeline Deep Inspection Store
Metrics And Alarms
RT StreamIndexer
Delivery IntegrationHub
TargetSystems
DashboardEditor
Visualization
RT DASHBOARD[Corona]
PIPELINE STUDIO[Pulsar]
DEEP INSPECTIONQuery UI
AUTHOR ATTRIBUTE
Query UI
UNIVERSAL STREAMQuery UI
LDA Classifier Generator
AKUDA LABS PROPRIETARY AND CONFIDENTIAL
Pulsar �Bananas: Data Classification�
Optimizing Parallelizing Compilerfor Classification, Analysis and Action
Network
LDACluster
Generator
LDA Cluster
Refinement
Massively Parallel RT Classification Engine
Social Media Data Sources
Universal Store
Social Media
Harvester
General Data
Integration Hub
Data Source
AkudaAgent
UniversalSearchable
Index
Data Source
DirectFeed
Author [G,A,E]
Image Analyzer(LGM)
Author Info Analyzer(LGM)
General Data Sources
Real-timeStream
Aggregator
RT Classification Pipeline
Author Geolocation
Analyzer(LGM)
Image Data
Sources
ImageHarvester
AuthorAttribute
Processor(LGM)
Real-timeStream
Correlator
Author Attribute Store
Image UniversalSearchable
IndexImage Store
Massively Parallel RT Classification Engine
AKUDABroadcaster
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
Author Atribute Detector
Author Attribute Detector
Author Attribute Detector
Author Attribute Detector
Author Attribute Detector
Author Attribute Detector
Author Attribute Detector
Author Attribute Detector
Author Attribute Detector
Author Attribute Detector
(LGM
)
Author Atribute Detector
Author Atribute Detector
Author Atribute Detector
LDAFeature
Generator(Proximity NGRAMS)
MISSION EDITOR
DFA
Tap DFA
Tap
Tap
DFA
DFAClassifier
Refinement
Pipeline Deep Inspection Store
Metrics And Alarms
RT StreamIndexer
Delivery IntegrationHub
TargetSystems
DashboardEditor
Visualization
RT DASHBOARD[Corona]
PIPELINE STUDIO[Pulsar]
DEEP INSPECTIONQuery UI
AUTHOR ATTRIBUTE
Query UI
UNIVERSAL STREAMQuery UI
LDA Classifier Generator
AKUDA LABS PROPRIETARY AND CONFIDENTIAL
Pulsar �Corona: Cluster Analysis �
Optimizing Parallelizing Compilerfor Classification, Analysis and Action
Network
LDACluster
Generator
LDA Cluster
Refinement
Massively Parallel RT Classification Engine
Social Media Data Sources
Universal Store
Social Media
Harvester
General Data
Integration Hub
Data Source
AkudaAgent
UniversalSearchable
Index
Data Source
DirectFeed
Author [G,A,E]
Image Analyzer(LGM)
Author Info Analyzer(LGM)
General Data Sources
Real-timeStream
Aggregator
RT Classification Pipeline
Author Geolocation
Analyzer(LGM)
Image Data
Sources
ImageHarvester
AuthorAttribute
Processor(LGM)
Real-timeStream
Correlator
Author Attribute Store
Image UniversalSearchable
IndexImage Store
Massively Parallel RT Classification Engine
AKUDABroadcaster
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
Author Atribute Detector
Author Attribute Detector
Author Attribute Detector
Author Attribute Detector
Author Attribute Detector
Author Attribute Detector
Author Attribute Detector
Author Attribute Detector
Author Attribute Detector
Author Attribute Detector
(LGM
)
Author Atribute Detector
Author Atribute Detector
Author Atribute Detector
LDAFeature
Generator(Proximity NGRAMS)
MISSION EDITOR
DFA
Tap DFA
Tap
Tap
DFA
DFAClassifier
Refinement
Pipeline Deep Inspection Store
Metrics And Alarms
RT StreamIndexer
Delivery IntegrationHub
TargetSystems
DashboardEditor
Visualization
RT DASHBOARD[Corona]
PIPELINE STUDIO[Pulsar]
DEEP INSPECTIONQuery UI
AUTHOR ATTRIBUTE
Query UI
UNIVERSAL STREAMQuery UI
LDA Classifier Generator
AKUDA LABS PROPRIETARY AND CONFIDENTIAL
Pulsar �Outbound Data Hub�
Optimizing Parallelizing Compilerfor Classification, Analysis and Action
Network
LDACluster
Generator
LDA Cluster
Refinement
Massively Parallel RT Classification Engine
Social Media Data Sources
Universal Store
Social Media
Harvester
General Data
Integration Hub
Data Source
AkudaAgent
UniversalSearchable
Index
Data Source
DirectFeed
Author [G,A,E]
Image Analyzer(LGM)
Author Info Analyzer(LGM)
General Data Sources
Real-timeStream
Aggregator
RT Classification Pipeline
Author Geolocation
Analyzer(LGM)
Image Data
Sources
ImageHarvester
AuthorAttribute
Processor(LGM)
Real-timeStream
Correlator
Author Attribute Store
Image UniversalSearchable
IndexImage Store
Massively Parallel RT Classification Engine
AKUDABroadcaster
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
RT Classification Pipeline
Author Atribute Detector
Author Attribute Detector
Author Attribute Detector
Author Attribute Detector
Author Attribute Detector
Author Attribute Detector
Author Attribute Detector
Author Attribute Detector
Author Attribute Detector
Author Attribute Detector
(LGM
)
Author Atribute Detector
Author Atribute Detector
Author Atribute Detector
LDAFeature
Generator(Proximity NGRAMS)
MISSION EDITOR
DFA
Tap DFA
Tap
Tap
DFA
DFAClassifier
Refinement
Pipeline Deep Inspection Store
Metrics And Alarms
RT StreamIndexer
Delivery IntegrationHub
TargetSystems
DashboardEditor
Visualization
RT DASHBOARD[Corona]
PIPELINE STUDIO[Pulsar]
DEEP INSPECTIONQuery UI
AUTHOR ATTRIBUTE
Query UI
UNIVERSAL STREAMQuery UI
LDA Classifier Generator
AKUDA LABS PROPRIETARY AND CONFIDENTIAL
THE AKUDA CORE!
MASSIVELY PARALLEL STREAMING CLASSIFICATION INFRASTRUCTURE!
AKUDA LABS PROPRIETARY AND CONFIDENTIAL
Possible Solution 1�NOT THIS - GTS: Scalability & Latency Problems �
Feed BC
Rx
Rx
Rx
Rx
Indexer
Broadcaster
GTS Indexing System
Query With Frequency 2 q/s
Indexer
Indexer
Indexer
Query With Frequency 2 q/s Query With Frequency 2 q/s Query With Frequency 2 q/s Query With Frequency 2 q/s Query With Frequency 2 q/s Query With Frequency 2 q/s Query With Frequency 2 q/s Query With Frequency 2 q/s Query With Frequency 2 q/s Query With Frequency 2 q/s Query With Frequency 2 q/s Query With Frequency 2 q/s Query With Frequency 2 q/s Query With Frequency 2 q/s Query With Frequency 2 q/s Query With Frequency 2 q/s Query With Frequency 2 q/s Query With Frequency 2 q/s Query With Frequency 2 q/s Query With Frequency 2 q/s Query With Frequency 2 q/s Query With Frequency 2 q/s Query With Frequency 2 q/s Query With Frequency 2 q/s Query With Frequency 2 q/s Query With Frequency 2 q/s Query With Frequency 2 q/s Query With Frequency 2 q/s Query With Frequency 2 q/s
Query With Frequency 2 q/s Query With Frequency 2 q/s Query With Frequency 2 q/s
AnalyticsVisualization
AnalyticsVisualization
AnalyticsVisualization
AnalyticsVisualization
AnalyticsVisualization
AnalyticsVisualization
AnalyticsVisualization
AnalyticsVisualization
AnalyticsVisualization
AnalyticsVisualization
AnalyticsVisualization
AnalyticsVisualization
AnalyticsVisualization
AnalyticsVisualization
AnalyticsVisualization
Index Storage
AKUDA LABS PROPRIETARY AND CONFIDENTIAL
Possible Solution 2�NOT THIS - HADOOP: Latency Problems �
Feed BC
Brodcaster
HADOOP
Broadcaster#
AKUDA LABS PROPRIETARY AND CONFIDENTIAL
Possible Solution 3�Not Quite There: Spark Streaming Pipeline of RDDs �
Source
1,000,000 documents/second
1,024 bytes/packet
MicroBatcher
1,000,000 Sequential Stages
Doc 01
Doc 02
Doc 03
Doc 04
Doc 05
Doc 06
Doc 07
Doc 08
Doc 09
Doc 10
Doc 11
Doc 12
Doc 13
Doc 14
Doc 15
Doc 16
Latency of minutes, hours??
Network Transfers and/or Data Copying Across Host Nodes or
Pipeline Stages
AKUDA LABS PROPRIETARY AND CONFIDENTIAL
Possible Solution 4�Almost There: Data Flow Pipelines, Data Replication�
Source
1,000,000 documents/second
1,000 bytes/packet
Broadcaster
Bisection Bandwidth1,000,000,000,000,000 bytes/second
~10,000,000 GBits/second~10,000 TBits/second !!!
Doc 01
Doc 01
Doc 01
Doc 01
Doc 01
Doc 01
Doc 01
Doc 01
Doc 01
Doc 01
Doc 01
Doc 01
Doc 01
Doc 01
Doc 01
Doc 01
Doc 01
Doc 01
Doc 01
1,000,000 Stages Running Simultaneously
Low Lat
Broadcasting Doc Replicas becomes extreme bottleneck
PCIe 3.0 lane BW: ~ 1GByte/second
10Gbps Ethernet: ~ 1GB/second
Infiniband: Mellanox 56Gb/s FDR IB:
6.8GB/s
Cisco Catalyst 2960G-49TC-L Switching
Fabric: 40mpps. At 1000 bytes/
packet: 40,000 MBytes/second
==> 40 GBytes/second
Intel-Xeon-Processor-E7-8890 (15 cores)
Max Mem BW:85GBytes/second
AKUDA LABS PROPRIETARY AND CONFIDENTIAL
Possible Solution 5�Cost & Latency Issues: Data Broadcasting Tree �
Feed BC
Rx
Rx
Rx
Rx
Broadcaster
Rx
Rx
Rx
Rx
Rx
Rx
Rx
Rx
Rx
Rx
Rx
Rx
110
1010
Rx
Rx
Rx
Indexing /
Analytics
VisualizationIndexing /
Analytics
VisualizationIndexing /
Analytics
VisualizationIndexing /
Analytics
VisualizationIndexing /
Analytics
VisualizationIndexing /
Analytics
VisualizationIndexing /
Analytics
VisualizationIndexing /
Analytics
VisualizationIndexing /
Analytics
Visualization
Rx
Rx
Rx
Rx
Rx
Rx
Rx
Rx
Rx
Rx
Rx
Rx
Indexing /
Analytics
VisualizationIndexing /
Analytics
VisualizationIndexing /
Analytics
VisualizationIndexing /
Analytics
VisualizationIndexing /
Analytics
VisualizationIndexing /
Analytics
VisualizationIndexing /
Analytics
VisualizationIndexing /
Analytics
VisualizationIndexing /
Analytics
Visualization
Rx
Rx
Rx
Rx
Rx
Rx
1010
10
1 + 10 x 10 x 10 x 10 x 10 x 10 = 1,000,001 NodesWorst Case Cost = 1,000,001 * $1000/month:
~ $ 1 Billion / Month !!!!
Latency Goes back to hours or days!
AKUDA LABS PROPRIETARY AND CONFIDENTIAL
Honey, I Shrunk the Trees! �AKUDA Core Topology �
Indexing / Analytics
Feed
Rx TxVisualization
Broadcaster
Indexing / AnalyticsRx Tx
Visualization
Indexing / AnalyticsRx Tx
Visualization
Indexing / AnalyticsRx Tx
Visualization
Indexing / AnalyticsRx Tx
Visualization
Indexing / AnalyticsRx Tx
Visualization
Indexing / AnalyticsRx Tx
Visualization
Indexing / AnalyticsRx Tx
Visualization
Indexing / AnalyticsRx Tx
Visualization
Indexing / AnalyticsRx Tx
Visualization
Indexing / AnalyticsRx Tx
Visualization
Indexing / AnalyticsRx Tx
Visualization
Indexing / AnalyticsRx Tx
Visualization
Indexing / AnalyticsRx Tx
Visualization
Indexing / AnalyticsRx Tx
Visualization
Indexing / AnalyticsRx Tx
Visualization
Indexing / AnalyticsRx Tx
Visualization
Indexing / AnalyticsRx Tx
Visualization
Indexing / AnalyticsRx Tx
Visualization
Indexing / AnalyticsRx
Model Pipelines
TxVisualization
1,000,000 documents/second
1,000 bytes/packet
100,000 Short Pipelines * 10 Stages each= 1,000,000 Stages
Akuda Queue Technologyusing on-chip inter-core
networks
Akuda Buffer Technologyusing on-chip inter-core
networks
Akuda Correlator Technologyusing on-chip inter-core
networks
0.001 seconds typical latency
AKUDA LABS PROPRIETARY AND CONFIDENTIAL
The Solution �Utilize Inter-Core Communication Channel��
Data Communication Hardware! Typical Bandwidth! Typical Cost!
10 Gbps Ethernet! 1 GB/s! $ 1,000!
PCIe 3.0 Lane! 1 GB/s! $10,000!
Infiniband, Mellanox 56Gb/s FDR IB! 6.8 GB/s! $1,000,000!
Cisco Catalyst Switching Fabric! 40 GB/s! $10,000,000!
Inter-core/Inter-processor Fabric Bisection Bandwidth!
1000 GB/s !(for IA64 Chips)! $500!
AKUDA LABS PROPRIETARY AND CONFIDENTIAL
Data Broadcasting�Use The Best Broadcasting Network�� 340GB/s
> 1000GB/s
AKUDA LABS PROPRIETARY AND CONFIDENTIAL
The Solution �AKUDA Core Differentiating Factors ��
Lockfree Queue, Pipeline Control!Lockfree Correlator!Lockfree Multithreaded Processing!
Feed BC
Broadcaster
Indexing / AnalyticsRx Tx
Visualization
1
101000
AkudaCore
Indexing / AnalyticsRx Tx
Visualization
Indexing / AnalyticsRx Tx
Visualization
Indexing / AnalyticsRx Tx
Visualization
Indexing / AnalyticsRx Tx
Visualization
Indexing / AnalyticsRx Tx
Visualization
Indexing / AnalyticsRx Tx
Visualization
Indexing / AnalyticsRx Tx
Visualization
Indexing / AnalyticsRx Tx
Visualization
AkudaCore
Indexing / AnalyticsRx Tx
Visualization
Indexing / AnalyticsRx Tx
Visualization
Indexing / AnalyticsRx Tx
Visualization
Indexing / AnalyticsRx Tx
Visualization
Indexing / AnalyticsRx Tx
Visualization
Indexing / AnalyticsRx Tx
Visualization
Indexing / AnalyticsRx Tx
Visualization
Indexing / AnalyticsRx Tx
Visualization
AkudaCore
AkudaCore
AkudaCore
Zero-replication Data Broadcasting!
On-chip-network Communication Control!
AKUDA LABS PROPRIETARY AND CONFIDENTIAL
Adaptive Topology�Continuous Optimization of Data Comm & Pipeline Execution ��
AKUDA LABS PROPRIETARY AND CONFIDENTIAL
Akuda Core Scalability�� Bisection Bandwidth
10 20 30 40 50 60 70 80
80
70
60
50
40
30
20
10
BBW
Processors
Akuda Lock-freeAlgorithms
Standard Algorithms
Processing Latency
10 20 30 40 50 60 70 80
80
70
60
50
40
30
20
10
Time
Processors
Akuda Lock-freeAlgorithms
Standard Algorithms
Processing Cost
200 400 600 800 1000 1200 1400 1600
800
700
600
500
400
300
200
100
1000 $ / Month
MILLION [Stream Rate * Pipelines * Patterns]
Akuda Lock-freeAlgorithms
Standard Algorithms
Parallelization Speedup
10 20 30 40 50 60 70 80
80
70
60
50
40
30
20
10
Speedup
Processors
Akuda Lock-freeAlgorithms
Standard Algorithms
AKUDA LABS PROPRIETARY AND CONFIDENTIAL
AKUDA Core in Action �Election2016.io: Real-Time Online Polls �
“The problem is that when polls are wrong, they tend to be wrong in the same direction. If they miss in New Hampshire, for instance, they all miss on the same mistake.” -- Nate Silver!
AKUDA LABS PROPRIETARY AND CONFIDENTIAL
Akuda Core in Action �Election2016.io Backend��
Feed
Indexing / AnalyticsRx
Model Pipelines
TxVisualization
50,000 documents/second (peak)
1,000 bytes/document
3000 Models (Author Classification + App Classification)
Akuda Broadcasting Technology
using on-chip inter-core fabric
Akuda Buffer Technologyusing on-chip inter-core
fabric
Akuda Correlator Technology
Sub-second Latency
150 GigaBytes Bisection Bandwidth(Over 1 TERAbit/second)
Indexing / AnalyticsRx Tx
Visualization
Indexing / AnalyticsRx Tx
Visualization
Indexing / AnalyticsRx Tx
Visualization
Indexing / AnalyticsRx Tx
Visualization
Indexing / AnalyticsRx Tx
Visualization
Indexing / AnalyticsRx Tx
Visualization
Indexing / AnalyticsRx Tx
Visualization
Indexing / AnalyticsRx Tx
Visualization
Indexing / AnalyticsRx Tx
Visualization
Indexing / AnalyticsRx Tx
Visualization
Indexing / AnalyticsRx Tx
Visualization
Indexing / AnalyticsRx Tx
Visualization
Indexing / AnalyticsRx Tx
Visualization
Indexing / AnalyticsRx Tx
Visualization
Indexing / AnalyticsRx Tx
Visualization
Learned PeopleAttributes
Akuda ClassificationTechnology
Correlator
Akuda Data AnalysisTechnology
100 Patterns/Model
15 BILLIONPatterns/second
AKUDA LABS PROPRIETARY AND CONFIDENTIAL
General Statistical Classification�K-MEANS, LDA, NN �
Feed
Rx Tx
Broadcaster
Rx Tx
Rx Tx
Rx Tx
Rx Tx
Rx Tx
Rx Tx
Rx Tx
Rx Tx
Rx Tx
Rx Tx
Rx Tx
Rx Tx
Rx Tx
Rx Tx
Rx Tx
Rx Tx
Rx Tx
Rx Tx
Rx
k-means Model SubMatrix
Tx
1,000,000 documents/second
1,000 bytes/packet
10,000 Nodes * 100 k-means Centroid Vectors
Akuda Queue Technologyusing on-chip inter-core
networks
Akuda Buffer Technologyusing on-chip inter-core
networks
0.001 seconds typical latency
Aggregator
Akuda Queue Technologyusing on-chip inter-core
networks
k-meanscluster label
for data item
Akuda LocklessMatrix Ops
Akuda LocklessCorrelator
AKUDA LABS PROPRIETARY AND CONFIDENTIAL
IOT Classification POC �K-MEANS �
Feed
Rx Tx
Broadcaster
Rx Tx
Rx Tx
Rx Tx
Rx Tx
Rx Tx
Rx Tx
Rx Tx
Rx Tx
Rx Tx
Rx Tx
Rx Tx
Rx Tx
Rx Tx
Rx Tx
Rx Tx
Rx Tx
Rx Tx
Rx Tx
Rx Tx
1,000,000 sensor-vectors/second
1,000 bytes/vector
Classification Using DFA, K-MEANS, LDA, or NN Models
Akuda Queue Technologyusing on-chip inter-core
networks
Akuda Buffer Technologyusing on-chip inter-core
networks
0.001 seconds typical latency
Aggregator
Akuda Queue Technologyusing on-chip inter-core
networks
Sensor Warnings
Akuda LocklessMatrix Ops
Sensor State Classification
Akuda LocklessCorrelator
AKUDA LABS PROPRIETARY AND CONFIDENTIAL
IOT Classification POC �K-MEANS �
LINEAR ALGEBRA
ENGINE - 1
LINEAR ALGEBRA
ENGINE - 2
LINEAR ALGEBRA
ENGINE - N
LINEAR ALGEBRA
ENGINE - 100
DATA RECEIVER
INPUT DATA CHANNEL
L2 NORM CHANNEL
AGGREGATOR
LOCKLESS HASH
UNSORTED CHANNEL
MIN FINDERINPUT DATA
STREAMOUTPUT DATA
STREAM
Packet ID
Input Packet: D
Packet ID
Transformed Packet: D’
Packet ID
Minimum Elements Vector
Minimum Distance from Classifier: Pn
Packet ID
Classified Packet
For, K = 100,000 (number of clusters) N = 100 (number of processors) P = 1000 (cardinality of feature set) D : Input Vector to be classified A : Model matrix representing trained values for classification centroids
AKUDA LABS PROPRIETARY AND CONFIDENTIAL
PATENT LIST (1/3)#1 HIERARCHICAL, PARALLEL MODELS FOR EXTRACTING IN REAL TIME HIGH-VALUE INFORMATION FROM DATA STREAMS AND SYSTEM AND METHOD FOR
CREATION OF SAME
2 HIERARCHICAL, PARALLEL MODELS FOR EXTRACTING IN REAL-TIME HIGH-VALUE INFORMATION FROM DATA STREAMS AND SYSTEM AND METHOD FOR CREATION OF SAME
3 MASSIVELY-PARALLEL SYSTEM ARCHITECTURE AND METHOD FOR REAL-TIME EXTRACTION OF HIGH-VALUE INFORMATION FROM DATA STREAMS
4 OPTIMIZATION FOR REAL-TIME, PARALLEL EXECUTION OF MODELS FOR EXTRACTING HIGH-VALUE INFORMATION FROM DATA STREAMS
5 EXTRACTION OF HIGH VALUE INFORMATION FROM UNSTRUCTURED IMAGES IN MASSIVELY PARALLEL PROCESSING SYSTEM
6 REAL-TIME MASSIVELY PARALLEL PIPELINE PROCESSING SYSTEM
7 ADDITIONAL APPLICATIONS DIRECTED TO SPECIFIC ASPECTS/IMPROVEMENTS OF REAL-TIME MASSIVELY PARALLEL PIPELINE PROCESSING SYSTEM
8 AUTOMATIC TOPIC DISCOVERY IN STREAMS OF SOCIAL MEDIA POSTS
9 TOPIC AND TREND DISCOVERY WITHIN REAL-TIME ONLINE CONTENT STREAMS
10 SYSTEM AND METHOD FOR IMPLEMENTING ENTERPRISE RISK MODELS BASED ON INFORMATION POSTS
11 ADDITIONAL APPLICATIONS DIRECTED TO SPECIFIC MODELS OTHER THAN RISK MODELS
12 LAZY PARSER FOR INFERENCE IN UNSTRUCTURED DATA STREAMS
13 REALTIME DATA STREAM CLUSTER SUMMARIZATION AND LABELING SYSTEM
14 DATA BROADCASTING TECHNOLOGY FOR REAL TIME ANALYTICS FROM UNSTRUCTURED DATA
15 REAL-TIME STREAM CORRELATION WITH PRE-EXISTING KNOWLEDGE (STATE)
16 LOCKLESS KEY-VALUE STORE AND MEMORY CACHING SYSTEM
17 DYNAMIC RESOURCE ALLOCATOR FOR REAL-TIME PARALLEL PIPELINE PROCESSING SYSTEM
AKUDA LABS PROPRIETARY AND CONFIDENTIAL
PATENT LIST (2/3)#18
REALTIME LOW LATENCY DATA STREAM DFA CLASSIFICATION ENGINE
19 PARALLEL PROCESSING ARCHITECTURE AND DATA BROADCASTING TECHNOLOGY FOR SOCIAL MEDIA AUTHOR CLASSIFICATION AND ANALYSIS STREAM
20 ATTRIBUTE VECTOR COMPRESSION FOR STREAM PROCESSING
21 REATIME IOT PARALLEL VECTOR CLASSIFICATION
22 REALTIME IMAGE HARVESTING AND STORAGE SYSTEM
23 DATA STREAM HISTORIC REPLAY VERSIONING (SKYLINE)
24 DATA STREAM HISTORIC REPLAY SYSTEM AND STORAGE
25 EXTRACTION OF AUTHOR(PEOPLE) ATTRIBUTES THROUGH COMPLEX DFA MODELS
26 REALTIME IMAGE HARVESTING AND STORAGE SYSTEM
27 NEURAL NETWORK-BASED SYSTEM FOR EXTRACTION OF DEMOGRAPHICS FROM SOCIAL MEDIA IMAGES
28 METHODFORSOCIALMEDIAEVENTDETECTIONANDCAUSEANALYSIS
29 METHOD FOR REAL-TIME TAGGING OF DATA STREAM DOCUMENTS
30 PEOPLE ATTRIBUTE QUERY AND VISUALIZATION TOOL
31 WORD SET VISUAL NORMALIZED WEIGHT DAMPENING
32 PARALLEL PROCESSING ARCHITECTURE AND DATA BROADCASTING TECHNOLOGY FOR REAL TIME ANALYTICS FROM UNSTRUCTURED ELECTION DATA
33 PARALLEL PROCESSING ARCHITECTURE AND DATA BROADCASTING TECHNOLOGY FOR REAL TIME ANALYTICS FROM UNSTRUCTURED RETAIL DATA
AKUDA LABS PROPRIETARY AND CONFIDENTIAL
PATENT LIST (3/3)#34
SYSTEMS AND METHODS FOR ANALYZING UNSOLICITED PRODUCT/SERVICE CUSTOMER REVIEWS
35 SYSTEM FOR CREDIT/INSURANCE PROCESSING USING UNSTRUCTURED DATA
36 SYSTEM AND METHOD FOR CORRELATING SOCIAL MEDIA DATA AND COMPANY FINANCIAL DATA
37 SYSTEMS AND METHODS FOR IDENTIFYING AN ILLNESS AND COURSE OF TREATMENT FOR A PATIENT
38 SYSTEM AND METHOD FOR IDENTIFYING FACIAL EXPRESSIONS FROM SOCIAL MEDIA IMAGES
39 SYSTEM AND METHOD FOR DETECTING HEALTH MALADIES IN A PATIENT USING UNSTRUCTURED IMAGES
40 SYSTEM AND METHOD FOR DETECTING POLITICAL DESTABILIZATION AT A SPECIFIC GEOGRAPHIC LOCATION BASED ON SOCIAL MEDIA DATA
41 SYSTEM AND METHOD FOR IDENTIFYING CORRELATIONS BETWEEN SOCIAL MEDIA IMAGES USING NEURAL NETWORKS
42 SYSTEM AND METHOD FOR SCALABLE PROCESSING OF DATA PIPELINES USING A LOCKLESS SHARED MEMORY SYSTEM
43 ASYNCHRONOUS WEB PAGE DATA AGGREGATOR
44 APPLICATIONS OF DISTIBUTED PROCESSING AND DATA BROADCASTING TECHNOLOGY TO REAL TIME NEWS SERVICE
45 DISTRIBUTED PROCESSING AND DATA BROADCASTING TECHNOLOGY FOR REAL TIME THREAT ANALYSIS
46 DISTRIBUTED PROCESSING AND DATA BROADCASTING TECHNOLOGY FOR REAL TIME EMERGENCY RESPONSE
47 DISTRIBUTED PROCESSING AND DATA BROADCASTING TECHNOLOGY FOR CLIMATE ANALYTICS
48 DISTRIBUTED PROCESSING AND DATA BROADCASTING TECHNOLOGY FOR INSURANCE RISK ASSESSMENT
49 DISTRIBUTED PARALLEL ARCHITECTURES FOR REAL TIME PROCESSING OF STREAMS OF STRUCTURED AND UNSTRUCTURED DATA
AKUDA LABS PROPRIETARY AND CONFIDENTIAL
The Solution �Akuda Core Topology with Kafka��
UU#OnUchipUnetwork#Comm#Control#UU#ZeroUcopy#Data#Broadcas@ng#UU#Lockfree#queue,#pipeline#control#UU#Lockfree#correlator#UU#Lockfree#Mul@threaded#Processing#
Feed BC
Kafka
Indexing / AnalyticsRx Tx
Visualization
1
101000
AkudaCore
Indexing / AnalyticsRx Tx
Visualization
Indexing / AnalyticsRx Tx
Visualization
Indexing / AnalyticsRx Tx
Visualization
Indexing / AnalyticsRx Tx
Visualization
Indexing / AnalyticsRx Tx
Visualization
Indexing / AnalyticsRx Tx
Visualization
Indexing / AnalyticsRx Tx
Visualization
Indexing / AnalyticsRx Tx
Visualization
AkudaCore
Indexing / AnalyticsRx Tx
Visualization
Indexing / AnalyticsRx Tx
Visualization
Indexing / AnalyticsRx Tx
Visualization
Indexing / AnalyticsRx Tx
Visualization
Indexing / AnalyticsRx Tx
Visualization
Indexing / AnalyticsRx Tx
Visualization
Indexing / AnalyticsRx Tx
Visualization
Indexing / AnalyticsRx Tx
Visualization
AkudaCore
AkudaCore
AkudaCore
AKUDA LABS PROPRIETARY AND CONFIDENTIAL
Pulsar �Functional View �
Unstructured Data Source
Streams
Unstructured Data Source
Batch
Unstructured Data Source
Images
MILLIONS OF DOCUMENTSPER SECOND
LDACONTROL
AKUDADEEP INSPECTION
THIRD-PARTYDATA ANALYTICS
HADOOPBASED ANALYTICS
THIRD-PARTYVISUALIZATION
AKUDADASHBOARD
RTContent
Classification(DFA/LDA/VEC)
RTAuthor
Classification(DFA/LDA)
Optimizing Parallelizing Compiler
Normalization RT
AuthorImage Analysis
(NEURAL NETS)
UniversalIndexing
P-GRAM GEN
Indexer
STATS / ANALYTICS
Author ATTR
Author GEO
Author DEM
LDA PROC
P-GRAM GEN LDA PROC
10+ BILLIONS OF CLASSIFICATIONSPER SECOND
MISSIONEDITOR
AKUDA LABS PROPRIETARY AND CONFIDENTIAL
Automatic Cluster Discovery�P-GRAMS, LDA, CONVERGENCE �
Mission Deep Inspection Store
Summarizer
p-GRAMGenerator
Mission StreamConceptExtractor
LDASolver
ConvergenceMonitor
p-GRAMS
CorpusSummary
CorpusConcept Cloud
LabeledCorpusClusters
ClassificationModelLibrary
LDACluster Generation & Labeling
LDA Cluster Refinement
DFAClassifier Refinement
LDA Classifier Generator
AKUDA LABS PROPRIETARY AND CONFIDENTIAL
Author Attribute Discovery�Neural Networks, Bayesian Models, DFAs �
EthnicityImage Analyzer
Author Info Analyzer(LGM)
Real-timeStream
Aggregator
Author Geolocation
Analyzer(LGM)
AuthorAttribute
Processor(LGM)
Real-timeStream
Correlator
Massively Parallel RT Classification Engine
AKUDABroadcaster
Author Atribute Detector
Author Attribute Detector
Author Attribute Detector
Author Attribute Detector
Author Attribute Detector
Author Attribute Detector
Author Attribute Detector
Author Attribute Detector
Author Attribute Detector
Author Attribute Detector
(LGM)
Author Atribute Detector
Author Atribute Detector
Author Atribute Detector
Unstructured Data Source
A
Unstructured Data Source
B
Unstructured Data Source
C
Normalization
AgeImage Analyzer
GenderImage Analyzer
LabeledImage
GeneratorNeural Network
Trainer
Author BayesianClassification Model Trainer
AKUDA LABS PROPRIETARY AND CONFIDENTIAL
Generalized Image Classification�Neural Networks, Bayesian Models, DFAs �
EthnicityImage Analyzer
AgeImage Analyzer
GenderImage Analyzer
LabeledImage
Generator
Neural NetworkTrainer
Image Data
Sources
ImageHarvester
LogoIdentification
FaceDetector Glasses
Image Analyzer
WeightImage Analyzer
Hair-styleImage Analyzer
ShapeIdentification
EmotionImage Analyzer
ImageLabel
Classifier
Image DB
AKUDA LABS PROPRIETARY AND CONFIDENTIAL
Pipeline Editor �Automatic LDA Models, User-specified DFAs �
RTContent
Classification(DFA/LDA/VEC)
Optimizing Parallelizing Compiler
PIPELINE EDITORFiltering, Analysis And Action Network
LDAClassifier
VectorStringCMP
VectorINT/FP
CMP
DFA
CounterTap
ActionBlock
DFA
CounterTap
CounterTap
DFA
ActionBlock
OutputInou
t
LDAClassifier
VectorStringCMP
VectorINT/FPCMPDFA Action
Block
CounterTap
Model LibraryAirlinesAutoAuto InsuranceCableBeveragesFast FoodFinanceHousingLegalPharma/Health
Most Used DetectorsTech
Advertisement
InquiryCustomer ServiceIrate CustomersThankful Customers
Consumers
STATE MANAGEMENT
P-GRAM GEN
Indexer
LDA PROC