![Page 1: | Intel - pic.huodongjia.com€¦ · Functional Comparison and Performance Evaluation ... 张天伦 2016/10/27 Intel亚太研发中心 | Intel 软件与服务部 Big Data Technology](https://reader030.vdocuments.pub/reader030/viewer/2022040307/5ed4908b3d6f7d64f9068026/html5/thumbnails/1.jpg)
Functional Comparison and Performance Evaluation
王华峰
毛玮
张天伦
2016/10/27 Intel亚太研发中心 | Intel 软件与服务部 Big Data Technology Department
![Page 2: | Intel - pic.huodongjia.com€¦ · Functional Comparison and Performance Evaluation ... 张天伦 2016/10/27 Intel亚太研发中心 | Intel 软件与服务部 Big Data Technology](https://reader030.vdocuments.pub/reader030/viewer/2022040307/5ed4908b3d6f7d64f9068026/html5/thumbnails/2.jpg)
Overview
Streaming Core
MISC
Performance Benchmark
Choose your weapon !
2
*Other names and brands may be claimed as the property of others.
*Other names and brands may be claimed as the property of others.
![Page 3: | Intel - pic.huodongjia.com€¦ · Functional Comparison and Performance Evaluation ... 张天伦 2016/10/27 Intel亚太研发中心 | Intel 软件与服务部 Big Data Technology](https://reader030.vdocuments.pub/reader030/viewer/2022040307/5ed4908b3d6f7d64f9068026/html5/thumbnails/3.jpg)
Execution Model + Fault Tolerance Mechanism
![Page 4: | Intel - pic.huodongjia.com€¦ · Functional Comparison and Performance Evaluation ... 张天伦 2016/10/27 Intel亚太研发中心 | Intel 软件与服务部 Big Data Technology](https://reader030.vdocuments.pub/reader030/viewer/2022040307/5ed4908b3d6f7d64f9068026/html5/thumbnails/4.jpg)
Apache Spark Streaming*
Aapche Flink*
Apache Storm*
Apache Storm Trident*
Apache Gearpump*
Twitter Heron*
Micro-Batch
Source Operator Sink
Continuous Streaming
*Other names and brands may be claimed as the property of others.
4
Source Operator Sink
![Page 5: | Intel - pic.huodongjia.com€¦ · Functional Comparison and Performance Evaluation ... 张天伦 2016/10/27 Intel亚太研发中心 | Intel 软件与服务部 Big Data Technology](https://reader030.vdocuments.pub/reader030/viewer/2022040307/5ed4908b3d6f7d64f9068026/html5/thumbnails/5.jpg)
Apache Spark Streaming*
Aapche Flink*
Apache Storm*
Apache Storm Trident*
Apache Gearpump*
Twitter Heron*
This is the critical part, as it affects many features
Micro-Batch
Checkpoint per Batch
Continuous Streaming
Checkpoint “per Batch”
Source Operator Sink
Acker
Source Operator Sink
JobManager/ HDFS
id offset state str ack
Source Operator Sink
Driver
Storage Storage
job status
HDFS
id offset state str
Continuous Streaming
Ack per Record
Storage
*Other names and brands may be claimed as the property of others.
5
![Page 6: | Intel - pic.huodongjia.com€¦ · Functional Comparison and Performance Evaluation ... 张天伦 2016/10/27 Intel亚太研发中心 | Intel 软件与服务部 Big Data Technology](https://reader030.vdocuments.pub/reader030/viewer/2022040307/5ed4908b3d6f7d64f9068026/html5/thumbnails/6.jpg)
Low Latency High Latency
High Throughput Low Throughput
High Overhead Low Overhead
6
Apache Spark Streaming*
Aapche Flink*
Apache Storm*
Apache Storm Trident*
Apache Gearpump*
Twitter Heron*
Micro-Batch
Checkpoint per Batch
Continuous Streaming
Checkpoint “per Batch”
Continuous Streaming
Ack per Record
*Other names and brands may be claimed as the property of others.
![Page 7: | Intel - pic.huodongjia.com€¦ · Functional Comparison and Performance Evaluation ... 张天伦 2016/10/27 Intel亚太研发中心 | Intel 软件与服务部 Big Data Technology](https://reader030.vdocuments.pub/reader030/viewer/2022040307/5ed4908b3d6f7d64f9068026/html5/thumbnails/7.jpg)
Delivery Guarantee
At least once Exactly once
• Ackers know about if a record is processed successfully or not. If it failed, replay it.
• There is no state
consistency guarantee.
• State is persisted in durable storage
• Checkpoint is linked with
state storage per Batch
7
Apache Spark Streaming*
Aapche Flink*
Apache Storm*
Apache Storm Trident*
Apache Gearpump*
Twitter Heron*
*Other names and brands may be claimed as the property of others.
![Page 8: | Intel - pic.huodongjia.com€¦ · Functional Comparison and Performance Evaluation ... 张天伦 2016/10/27 Intel亚太研发中心 | Intel 软件与服务部 Big Data Technology](https://reader030.vdocuments.pub/reader030/viewer/2022040307/5ed4908b3d6f7d64f9068026/html5/thumbnails/8.jpg)
Native State Operator
Yes* Yes Yes
• Flink Java API: ValueState ListState ReduceState
• Flink Scala API:
mapWithState
• Gearpump persistState
• Spark 1.5: updateStateByKey
• Spark 1.6:
mapWithState
• Trident: persistentAggregate State
• Storm: KeyValueState
• Heron: X User Maintain
8
Apache Spark Streaming*
Aapche Flink*
Apache Storm*
Apache Storm Trident*
Apache Gearpump*
Twitter Heron*
*Other names and brands may be claimed as the property of others.
![Page 9: | Intel - pic.huodongjia.com€¦ · Functional Comparison and Performance Evaluation ... 张天伦 2016/10/27 Intel亚太研发中心 | Intel 软件与服务部 Big Data Technology](https://reader030.vdocuments.pub/reader030/viewer/2022040307/5ed4908b3d6f7d64f9068026/html5/thumbnails/9.jpg)
API
![Page 10: | Intel - pic.huodongjia.com€¦ · Functional Comparison and Performance Evaluation ... 张天伦 2016/10/27 Intel亚太研发中心 | Intel 软件与服务部 Big Data Technology](https://reader030.vdocuments.pub/reader030/viewer/2022040307/5ed4908b3d6f7d64f9068026/html5/thumbnails/10.jpg)
Compositional
• Highly customizable operator based on basic building blocks
• Manual topology definition and optimization
TopologyBuilder builder = new TopologyBuilder(); builder.setSpout(“input", new RandomSentenceSpout(), 1); builder.setBolt("split", new SplitSentence(), 3).shuffleGrouping("spout"); builder.setBolt("count", new WordCount(), 2).fieldsGrouping("split", new Fields("word"));
“foo, foo, bar” “foo”, “foo”, “bar” {“foo”: 2, “bar”: 1}
Spout Bolt Bolt
10
Apache Storm*
Apache Gearpump*
Twitter Heron*
*Other names and brands may be claimed as the property of others.
![Page 11: | Intel - pic.huodongjia.com€¦ · Functional Comparison and Performance Evaluation ... 张天伦 2016/10/27 Intel亚太研发中心 | Intel 软件与服务部 Big Data Technology](https://reader030.vdocuments.pub/reader030/viewer/2022040307/5ed4908b3d6f7d64f9068026/html5/thumbnails/11.jpg)
Declarative
• Higher order function as operators (filter, mapWithState…)
• Logical plan optimization
DataStream<String> text = env.readTextFile(params.get("input")); DataStream<Tuple2<String, Integer>> counts = text.flatMap(new Tokenizer()).keyBy(0).sum(1);
“foo, foo, bar” “foo”, “foo”, “bar” {“foo”: 1, “foo”: 1, “bar”: 1} {“foo”: 2, “bar”: 1}
11
Apache Spark Streaming*
Aapche Flink*
Apache Storm Trident*
Apache Gearpump*
*Other names and brands may be claimed as the property of others.
![Page 12: | Intel - pic.huodongjia.com€¦ · Functional Comparison and Performance Evaluation ... 张天伦 2016/10/27 Intel亚太研发中心 | Intel 软件与服务部 Big Data Technology](https://reader030.vdocuments.pub/reader030/viewer/2022040307/5ed4908b3d6f7d64f9068026/html5/thumbnails/12.jpg)
Statistical
• Data scientist friendly
• Dynamic type
Python
lines = ssc.textFileStream(params.get("input")) words = lines.flatMap(lambda line: line.split(“,")) pairs = words.map(lambda word: (word, 1)) counts = pairs.reduceByKey(lambda x, y: x + y) counts.saveAsTextFiles(params.get("output"))
R lines <- textFile(sc, “input”) words <- flatMap(lines, function(line) { strsplit(line, “ ”)[[1]] }) wordCount <- lapply(words, function(word) { list(word, 1L) } counts <- reduceByKey(wordCount, “+”, 2L)
˚Structured Streaming*
12
Apache Spark Streaming*
Apache Storm*
Twitter Heron*
˚Apache Storm*
*Other names and brands may be claimed as the property of others.
![Page 13: | Intel - pic.huodongjia.com€¦ · Functional Comparison and Performance Evaluation ... 张天伦 2016/10/27 Intel亚太研发中心 | Intel 软件与服务部 Big Data Technology](https://reader030.vdocuments.pub/reader030/viewer/2022040307/5ed4908b3d6f7d64f9068026/html5/thumbnails/13.jpg)
SQL
CREATE EXTERNAL TABLE ORDERS (ID INT PRIMARY KEY, UNIT_PRICE INT, QUANTITY INT) LOCATION 'kafka://localhost:2181/brokers?topic=orders' TBLPROPERTIES '{...}}‘ INSERT INTO LARGE_ORDERS SELECT ID, UNIT_PRICE * QUANTITY AS TOTAL FROM ORDERS WHERE UNIT_PRICE * QUANTITY > 50
bin/storm sql XXXX.sql
InputDStream.transform((rdd: RDD[Order], time: Time) => { import sqlContext.implicits._ rdd.toDF.registAsTempTable val SQL = "SELECT ID, UNIT_PRICE * QUANTITY AS TOTAL FROM ORDERS WHERE UNIT_PRICE * QUANTITY > 50" val largeOrderDF = sqlContext.sql(SQL) largeOrderDF.toRDD })
Fusion Style Pure Style
13
Apache Spark Streaming*
Aapche Flink*
Structured Streaming
Apache Storm Trident*
*Other names and brands may be claimed as the property of others.
![Page 14: | Intel - pic.huodongjia.com€¦ · Functional Comparison and Performance Evaluation ... 张天伦 2016/10/27 Intel亚太研发中心 | Intel 软件与服务部 Big Data Technology](https://reader030.vdocuments.pub/reader030/viewer/2022040307/5ed4908b3d6f7d64f9068026/html5/thumbnails/14.jpg)
Summary
Compositional Declarative Python/R SQL
X √ √ √
√ X √ NOT support aggregation,
windowing and joining X √ X
√ √ X X
X √ X Support select, from,
where, union
√ X √˚ X
14
Apache Spark Streaming*
Apache Storm*
Aapche Flink*
Apache Storm Trident*
Apache Gearpump*
Twitter Heron*
*Other names and brands may be claimed as the property of others.
![Page 15: | Intel - pic.huodongjia.com€¦ · Functional Comparison and Performance Evaluation ... 张天伦 2016/10/27 Intel亚太研发中心 | Intel 软件与服务部 Big Data Technology](https://reader030.vdocuments.pub/reader030/viewer/2022040307/5ed4908b3d6f7d64f9068026/html5/thumbnails/15.jpg)
Runtime Model
![Page 16: | Intel - pic.huodongjia.com€¦ · Functional Comparison and Performance Evaluation ... 张天伦 2016/10/27 Intel亚太研发中心 | Intel 软件与服务部 Big Data Technology](https://reader030.vdocuments.pub/reader030/viewer/2022040307/5ed4908b3d6f7d64f9068026/html5/thumbnails/16.jpg)
• Multi Tasks of Multi Applications on Single Process
JVM Process
Connect with
local SM
Thread Thread
Task
• Single Task on Single Process
Thread Thread
Task Task
JVM Process
Thread Thread
Task Task
JVM Process
Thread
Task
task from application A task from application B Task Task
JVM Process
Connect with
local SM
Thread
Task
Thread
16
Twitter Heron*
Aapche Flink*
*Other names and brands may be claimed as the property of others.
![Page 17: | Intel - pic.huodongjia.com€¦ · Functional Comparison and Performance Evaluation ... 张天伦 2016/10/27 Intel亚太研发中心 | Intel 软件与服务部 Big Data Technology](https://reader030.vdocuments.pub/reader030/viewer/2022040307/5ed4908b3d6f7d64f9068026/html5/thumbnails/17.jpg)
• Multi Tasks of Single application on Single Process
o Single task on single thread
o Multi tasks on single thread
Thread
Task
Thread
Task
Task
Task
Task
JVM Process
Thread Thread
Task Task
JVM Process
Thread Thread
Task Task
JVM Process
Thread
Task
Thread
Task
Thread
Task Task
JVM Process
17
Apache Spark Streaming*
Apache Storm*
Apache Storm Trident*
Apache Gearpump*
*Other names and brands may be claimed as the property of others.
![Page 18: | Intel - pic.huodongjia.com€¦ · Functional Comparison and Performance Evaluation ... 张天伦 2016/10/27 Intel亚太研发中心 | Intel 软件与服务部 Big Data Technology](https://reader030.vdocuments.pub/reader030/viewer/2022040307/5ed4908b3d6f7d64f9068026/html5/thumbnails/18.jpg)
● Window Support ● Out-of-order Processing ● Memory Management
● Resource Management ● Web UI ● Community Maturity
![Page 19: | Intel - pic.huodongjia.com€¦ · Functional Comparison and Performance Evaluation ... 张天伦 2016/10/27 Intel亚太研发中心 | Intel 软件与服务部 Big Data Technology](https://reader030.vdocuments.pub/reader030/viewer/2022040307/5ed4908b3d6f7d64f9068026/html5/thumbnails/19.jpg)
Window Support
• Sliding Window
smaller than gap
session gap
t t
• Count Window
• Session Window
Sliding Window Count Window Session Window
√ X X˚
√ √ X
√ √ X
√˚ X X
√ √ √
X X X
Apache Spark Streaming*
Apache Flink*
Apache Storm*
Apache Storm Trident* Apache
Gearpump*
Apache Heron*
19
*Other names and brands may be claimed as the property of others.
![Page 20: | Intel - pic.huodongjia.com€¦ · Functional Comparison and Performance Evaluation ... 张天伦 2016/10/27 Intel亚太研发中心 | Intel 软件与服务部 Big Data Technology](https://reader030.vdocuments.pub/reader030/viewer/2022040307/5ed4908b3d6f7d64f9068026/html5/thumbnails/20.jpg)
Out-of-order Processing
Processing Time Event Time Watermark
√ √˚ X˚
√ √ √
√ X X
√ √ √
√ √ √
√ X X
20
Apache Spark Streaming*
Apache Storm*
Aapche Flink*
Apache Storm Trident*
Apache Gearpump*
Twitter Heron*
*Other names and brands may be claimed as the property of others.
![Page 21: | Intel - pic.huodongjia.com€¦ · Functional Comparison and Performance Evaluation ... 张天伦 2016/10/27 Intel亚太研发中心 | Intel 软件与服务部 Big Data Technology](https://reader030.vdocuments.pub/reader030/viewer/2022040307/5ed4908b3d6f7d64f9068026/html5/thumbnails/21.jpg)
Memory Management
JVM Manage Self Manage on-heap Self Manage off-heap
√ √˚ √˚
√ √ √
√ X X
√ X X
√ X X
21
Apache Spark Streaming*
Aapche Flink*
Apache Storm*
Apache Gearpump*
Twitter Heron*
*Other names and brands may be claimed as the property of others.
![Page 22: | Intel - pic.huodongjia.com€¦ · Functional Comparison and Performance Evaluation ... 张天伦 2016/10/27 Intel亚太研发中心 | Intel 软件与服务部 Big Data Technology](https://reader030.vdocuments.pub/reader030/viewer/2022040307/5ed4908b3d6f7d64f9068026/html5/thumbnails/22.jpg)
Resource Management
Standalone YARN Mesos
√ √ √
√ √˚ √˚
√ √˚ √˚
√ √ X
√ √ X
√ √ √
22
Apache Spark Streaming*
Apache Storm*
Aapche Flink*
Apache Storm Trident*
Apache Gearpump*
Twitter Heron*
*Other names and brands may be claimed as the property of others.
![Page 23: | Intel - pic.huodongjia.com€¦ · Functional Comparison and Performance Evaluation ... 张天伦 2016/10/27 Intel亚太研发中心 | Intel 软件与服务部 Big Data Technology](https://reader030.vdocuments.pub/reader030/viewer/2022040307/5ed4908b3d6f7d64f9068026/html5/thumbnails/23.jpg)
Web UI
Submit Jobs
Cancel Jobs
Inspect Jobs
Show Statistics
Show Input Rate
Check Exceptions
Inspect Config
Alert
X √ √ √ √ √ √ X
X √ √ √ √˚ √ √ X
√ √ √ √ √˚ √ √ X
√ √ √ √ X √ √ X
X X √ √ √˚ √ √ X
Apache Spark
Streaming*
Apache Flink*
Apache Storm*
Apache Gearpump*
23
Twitter Heron*
*Other names and brands may be claimed as the property of others.
![Page 24: | Intel - pic.huodongjia.com€¦ · Functional Comparison and Performance Evaluation ... 张天伦 2016/10/27 Intel亚太研发中心 | Intel 软件与服务部 Big Data Technology](https://reader030.vdocuments.pub/reader030/viewer/2022040307/5ed4908b3d6f7d64f9068026/html5/thumbnails/24.jpg)
2161
237 161 514
77 0
500
1000
1500
2000
2500
Spark Storm Gearpump Flink Heron
Past 3 Months Summary on JIRA
Created Resloved
780
217
21
184 130 102 20 5 34 20
0
200
400
600
800
1000
Spark Storm Gearpump Flink Heron
Past 1 Months Summary on GitHub
Commits Committor
Community Maturity Initiation
Time
Apache Top
Project
Contributors
2013 2014 926
2011 2014 219
2014 Incubator 21
2010 2015 208
2014 N/A 44
24
Apache Spark
Streaming*
Apache Storm*
Apache Gearpump*
Apache Flink*
Twitter Heron*
*Other names and brands may be claimed as the property of others.
Source website: https://github.com/apache/spark/pulse/monthly
Source website: https://issues.apache.org/jira/secure/Dashboard.jspa
Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether referenced data are accurate.
![Page 25: | Intel - pic.huodongjia.com€¦ · Functional Comparison and Performance Evaluation ... 张天伦 2016/10/27 Intel亚太研发中心 | Intel 软件与服务部 Big Data Technology](https://reader030.vdocuments.pub/reader030/viewer/2022040307/5ed4908b3d6f7d64f9068026/html5/thumbnails/25.jpg)
HiBench 6.0
![Page 26: | Intel - pic.huodongjia.com€¦ · Functional Comparison and Performance Evaluation ... 张天伦 2016/10/27 Intel亚太研发中心 | Intel 软件与服务部 Big Data Technology](https://reader030.vdocuments.pub/reader030/viewer/2022040307/5ed4908b3d6f7d64f9068026/html5/thumbnails/26.jpg)
• “Lazy Benchmarking”
• Simple test case infer practical use case
Test Philosophical
![Page 27: | Intel - pic.huodongjia.com€¦ · Functional Comparison and Performance Evaluation ... 张天伦 2016/10/27 Intel亚太研发中心 | Intel 软件与服务部 Big Data Technology](https://reader030.vdocuments.pub/reader030/viewer/2022040307/5ed4908b3d6f7d64f9068026/html5/thumbnails/27.jpg)
Cluster Setup Apache Kafka* Cluster
• CPU: 2 x Intel(R) Xeon(R) CPU E5-
2699 v3@ 2.30GHz • Mem: 128 GB • Disk: 8 x HDD (1TB) • Network: 10 Gbps
10
Gb
ps
Test Cluster
• CPU: 2 x Intel(R) Xeon(R) CPU E5-2697 v2@ 2.70GHz
• Core: 20 / 24 • Mem: 80 / 128 GB • Disk: 8 x HDD (1TB ) • Network: 10 Gbps
x7
x3 Name Version
Java 1.8
Scala 2.11.7
Apache Hadoop* 2.6.2
Apache Zookeeper* 3.4.8
Apache Kafka* 0.8.2.2
Apache Spark* 1.6.1
Apache Storm* 1.0.1
Apache Flink* 1.0.3
Apache Gearpump* 0.8.1
• Apache Heron* require specific Operation System (Ubuntu/CentOS/Mac OS)
• Structured Streaming doesn’t support Kafka source yet (Spark 2.0)
27
*Other names and brands may be claimed as the property of others.
![Page 28: | Intel - pic.huodongjia.com€¦ · Functional Comparison and Performance Evaluation ... 张天伦 2016/10/27 Intel亚太研发中心 | Intel 软件与服务部 Big Data Technology](https://reader030.vdocuments.pub/reader030/viewer/2022040307/5ed4908b3d6f7d64f9068026/html5/thumbnails/28.jpg)
Architecture
Test Cluster (Standalone)
Data Generator
Metrics Reader File System
Topic A
Kafka Broker
Kafka Broker
Kafka Broker
Client Master
Slave
20 Core 80G Mem
Slave
20 Core 80G Mem
Slave
20 Core 80G Mem
Slave
20 Core 80G Mem
Slave
20 Core 80G Mem
Slave
20 Core 80G Mem
Slave
20 Core 80G Mem
Topic A
To
pic
B
Result
In Time
Out Time
Out Time – In Time
![Page 29: | Intel - pic.huodongjia.com€¦ · Functional Comparison and Performance Evaluation ... 张天伦 2016/10/27 Intel亚太研发中心 | Intel 软件与服务部 Big Data Technology](https://reader030.vdocuments.pub/reader030/viewer/2022040307/5ed4908b3d6f7d64f9068026/html5/thumbnails/29.jpg)
Framework Configuration
Framework Related Configuration
7 Executor 140 Parallelism
7 TaskManager 140 Parallelism
28 Worker 140 KafkaSpout
28 Executors 140 KafkaSource
29
Apache Spark Streaming*
Apache Storm*
Aapche Flink*
Apache Gearpump*
*Other names and brands may be claimed as the property of others.
![Page 30: | Intel - pic.huodongjia.com€¦ · Functional Comparison and Performance Evaluation ... 张天伦 2016/10/27 Intel亚太研发中心 | Intel 软件与服务部 Big Data Technology](https://reader030.vdocuments.pub/reader030/viewer/2022040307/5ed4908b3d6f7d64f9068026/html5/thumbnails/30.jpg)
Raw Input Data
• Kafka Topic Partition: 140
• Size Per Message (configurable): 200 bytes
• Raw Input Message Example:
“0,227.209.164.46,nbizrgdziebsaecsecujfjcqtvnpcnxxwiopmddorcxnlijdizgoi,1991-06-10,0.115967035,Mozilla/5.0 (iPhone; U; CPU like Mac OS X)AppleWebKit/420.1 (KHTML like Gecko) Version/3.0 Mobile/4A93Safari/419.3,YEM,YEM-AR,snowdrops,1”
• Strong Type: class UserVisit (ip, sessionId, browser)
• Keep feeding data at specific rate for 5 minutes
5 minutes
30
![Page 31: | Intel - pic.huodongjia.com€¦ · Functional Comparison and Performance Evaluation ... 张天伦 2016/10/27 Intel亚太研发中心 | Intel 软件与服务部 Big Data Technology](https://reader030.vdocuments.pub/reader030/viewer/2022040307/5ed4908b3d6f7d64f9068026/html5/thumbnails/31.jpg)
Data Input Rate
Throughput Message/Second Kafka Producer Num
40KB/s 0.2K 1
400KB/s 2K 1
4MB/s 20K 1
40MB/s 200K 1
80MB/s 400K 1
400MB/s 2M 10
600MB/s 3M 15
800MB/s 4M 20
![Page 32: | Intel - pic.huodongjia.com€¦ · Functional Comparison and Performance Evaluation ... 张天伦 2016/10/27 Intel亚太研发中心 | Intel 软件与服务部 Big Data Technology](https://reader030.vdocuments.pub/reader030/viewer/2022040307/5ed4908b3d6f7d64f9068026/html5/thumbnails/32.jpg)
![Page 33: | Intel - pic.huodongjia.com€¦ · Functional Comparison and Performance Evaluation ... 张天伦 2016/10/27 Intel亚太研发中心 | Intel 软件与服务部 Big Data Technology](https://reader030.vdocuments.pub/reader030/viewer/2022040307/5ed4908b3d6f7d64f9068026/html5/thumbnails/33.jpg)
Test Case: Identity
The application reads input data from Kafka and then writes result to Kafka immediately, there is no complex business logic involved.
![Page 34: | Intel - pic.huodongjia.com€¦ · Functional Comparison and Performance Evaluation ... 张天伦 2016/10/27 Intel亚太研发中心 | Intel 软件与服务部 Big Data Technology](https://reader030.vdocuments.pub/reader030/viewer/2022040307/5ed4908b3d6f7d64f9068026/html5/thumbnails/34.jpg)
Result
0
1
2
3
4
5
6
7
8
0 100 200 300 400 500 600 700 800Input Rate (MB/s)
P99 Latency (s)
Apache Spark* Apache Flink*
Apache Storm* without Ack Apache Storm* with Ack
*Other names and brands may be claimed as the property of others. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks. Results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling, and provided to you for informational purposes. Any differences in your system hardware, software or configuration may affect your actual performance.
![Page 35: | Intel - pic.huodongjia.com€¦ · Functional Comparison and Performance Evaluation ... 张天伦 2016/10/27 Intel亚太研发中心 | Intel 软件与服务部 Big Data Technology](https://reader030.vdocuments.pub/reader030/viewer/2022040307/5ed4908b3d6f7d64f9068026/html5/thumbnails/35.jpg)
![Page 36: | Intel - pic.huodongjia.com€¦ · Functional Comparison and Performance Evaluation ... 张天伦 2016/10/27 Intel亚太研发中心 | Intel 软件与服务部 Big Data Technology](https://reader030.vdocuments.pub/reader030/viewer/2022040307/5ed4908b3d6f7d64f9068026/html5/thumbnails/36.jpg)
Test Case: Repartition Basically, this test case can stand for the efficiency of data shuffle.
Network Shuffle
![Page 37: | Intel - pic.huodongjia.com€¦ · Functional Comparison and Performance Evaluation ... 张天伦 2016/10/27 Intel亚太研发中心 | Intel 软件与服务部 Big Data Technology](https://reader030.vdocuments.pub/reader030/viewer/2022040307/5ed4908b3d6f7d64f9068026/html5/thumbnails/37.jpg)
Result
0
100
200
300
400
0 200 400 600 800Input Rate (MB/s)
P99 Latency (s)
Apache Spark*Apache Flink*Apache Storm* without AckApache Gearpump*Apache Storm* with Ack
0
200
400
600
800
0 200 400 600 800Input Rate (MB/s)
Throughput (MB/s)
Apache Spark*
Apache Flink*
Apache Storm* without Ack
Apache Gearpump*
Apache Storm* with Ack
*Other names and brands may be claimed as the property of others. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks. Results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling, and provided to you for informational purposes. Any differences in your system hardware, software or configuration may affect your actual performance.
![Page 38: | Intel - pic.huodongjia.com€¦ · Functional Comparison and Performance Evaluation ... 张天伦 2016/10/27 Intel亚太研发中心 | Intel 软件与服务部 Big Data Technology](https://reader030.vdocuments.pub/reader030/viewer/2022040307/5ed4908b3d6f7d64f9068026/html5/thumbnails/38.jpg)
Observation
• Spark Streaming need to schedule task with additional context. Under tiny batch interval case, the overhead could be dramatic worse compared to other frameworks.
• According to our test, minimum Batch Interval of Spark is about 80ms (140 tasks per batch), otherwise task schedule delay will keep increasing
• Repartition is heavy for every framework, but usually it’s unavoidable.
• Latency of Gearpump is still quite low even under 800MB/s input throughput.
![Page 39: | Intel - pic.huodongjia.com€¦ · Functional Comparison and Performance Evaluation ... 张天伦 2016/10/27 Intel亚太研发中心 | Intel 软件与服务部 Big Data Technology](https://reader030.vdocuments.pub/reader030/viewer/2022040307/5ed4908b3d6f7d64f9068026/html5/thumbnails/39.jpg)
![Page 40: | Intel - pic.huodongjia.com€¦ · Functional Comparison and Performance Evaluation ... 张天伦 2016/10/27 Intel亚太研发中心 | Intel 软件与服务部 Big Data Technology](https://reader030.vdocuments.pub/reader030/viewer/2022040307/5ed4908b3d6f7d64f9068026/html5/thumbnails/40.jpg)
Test Case: Stateful WordCount
Native state operator is supported by all frameworks we evaluated
Stateful operator performance + Checkpoint/Acker cost
![Page 41: | Intel - pic.huodongjia.com€¦ · Functional Comparison and Performance Evaluation ... 张天伦 2016/10/27 Intel亚太研发中心 | Intel 软件与服务部 Big Data Technology](https://reader030.vdocuments.pub/reader030/viewer/2022040307/5ed4908b3d6f7d64f9068026/html5/thumbnails/41.jpg)
Result
0
20
40
60
80
100
0 200 400 600 800
Input Rate (MB/s)
P99 Latency (s)
Apache Spark* Apache Flink*
Apache Flink* without CP Apache Storm*
Apache Gearpump*
0
100
200
300
400
500
600
700
800
0 200 400 600 800
Input Rate (MB/s)
Throughput (MB/s)
Apache Spark* Apache Flink*
Apache Storm* Gearpump*
*Other names and brands may be claimed as the property of others. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks. Results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling, and provided to you for informational purposes. Any differences in your system hardware, software or configuration may affect your actual performance.
![Page 42: | Intel - pic.huodongjia.com€¦ · Functional Comparison and Performance Evaluation ... 张天伦 2016/10/27 Intel亚太研发中心 | Intel 软件与服务部 Big Data Technology](https://reader030.vdocuments.pub/reader030/viewer/2022040307/5ed4908b3d6f7d64f9068026/html5/thumbnails/42.jpg)
Observation
• Exactly-once semantics usually require state management and checkpoint. But better guarantees come at high cost.
• There is no obvious performance difference in Flink when switching fault tolerance on or off.
• Checkpoint mechanisms and storages play a critical role here.
![Page 43: | Intel - pic.huodongjia.com€¦ · Functional Comparison and Performance Evaluation ... 张天伦 2016/10/27 Intel亚太研发中心 | Intel 软件与服务部 Big Data Technology](https://reader030.vdocuments.pub/reader030/viewer/2022040307/5ed4908b3d6f7d64f9068026/html5/thumbnails/43.jpg)
![Page 44: | Intel - pic.huodongjia.com€¦ · Functional Comparison and Performance Evaluation ... 张天伦 2016/10/27 Intel亚太研发中心 | Intel 软件与服务部 Big Data Technology](https://reader030.vdocuments.pub/reader030/viewer/2022040307/5ed4908b3d6f7d64f9068026/html5/thumbnails/44.jpg)
Test Case: Window Based Aggregation
This test case manages a 10-seconds sliding window
![Page 45: | Intel - pic.huodongjia.com€¦ · Functional Comparison and Performance Evaluation ... 张天伦 2016/10/27 Intel亚太研发中心 | Intel 软件与服务部 Big Data Technology](https://reader030.vdocuments.pub/reader030/viewer/2022040307/5ed4908b3d6f7d64f9068026/html5/thumbnails/45.jpg)
Result
0
20
40
60
80
100
120
140
160
180
200
0 200 400 600 800
Input Rate (MB/s)
P99 Latency (s)
Apache Spark* Apache Flink* Storm*
0
100
200
300
400
500
600
0 200 400 600 800
Input Rate (MB/s)
Throughput (MB/s)
Apache Spark* Apache Flink* Storm*
*Other names and brands may be claimed as the property of others. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks. Results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling, and provided to you for informational purposes. Any differences in your system hardware, software or configuration may affect your actual performance.
![Page 46: | Intel - pic.huodongjia.com€¦ · Functional Comparison and Performance Evaluation ... 张天伦 2016/10/27 Intel亚太研发中心 | Intel 软件与服务部 Big Data Technology](https://reader030.vdocuments.pub/reader030/viewer/2022040307/5ed4908b3d6f7d64f9068026/html5/thumbnails/46.jpg)
![Page 47: | Intel - pic.huodongjia.com€¦ · Functional Comparison and Performance Evaluation ... 张天伦 2016/10/27 Intel亚太研发中心 | Intel 软件与服务部 Big Data Technology](https://reader030.vdocuments.pub/reader030/viewer/2022040307/5ed4908b3d6f7d64f9068026/html5/thumbnails/47.jpg)
Do your own benchmark
HiBench : a cross platforms micro-benchmark suite for big data
(https://github.com/intel-hadoop/HiBench)
Open Source since 2012
Better streaming benchmark supporting will be included in next release [HiBench 6.0]
![Page 48: | Intel - pic.huodongjia.com€¦ · Functional Comparison and Performance Evaluation ... 张天伦 2016/10/27 Intel亚太研发中心 | Intel 软件与服务部 Big Data Technology](https://reader030.vdocuments.pub/reader030/viewer/2022040307/5ed4908b3d6f7d64f9068026/html5/thumbnails/48.jpg)
Legal Disclaimer No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.
Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether referenced data are accurate.
Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.
Configurations:
Hardware:
Apache Kafka* Cluster - CPU: 2 x Intel(R) Xeon(R) CPU E5-2699 v3@ 2.30GHz, Mem: 128 GB, Disk: 8 x HDD (1TB), Network: 10 Gbps.
Test Cluster - CPU: 2 x Intel(R) Xeon(R) CPU E5-2697 v2@ 2.70GHz,Core: 20 / 24, Mem: 80 / 128 GB, Disk: 8 x HDD (1TB ), Network: 10 Gbps.
Software:
the software framework configuration is shown in page 29. The test results in page 34, 37, 41 and 45 used above configurations.
*Other names and brands may be claimed as the property of others.
Copyright ©2016 Intel Corporation.
![Page 49: | Intel - pic.huodongjia.com€¦ · Functional Comparison and Performance Evaluation ... 张天伦 2016/10/27 Intel亚太研发中心 | Intel 软件与服务部 Big Data Technology](https://reader030.vdocuments.pub/reader030/viewer/2022040307/5ed4908b3d6f7d64f9068026/html5/thumbnails/49.jpg)