Download - Introduction to Twitter Storm
![Page 1: Introduction to Twitter Storm](https://reader033.vdocuments.pub/reader033/viewer/2022060108/554dd405b4c905cc0e8b49d5/html5/thumbnails/1.jpg)
Sankt Augustin24-25.08.2013
Introduction to Twitter Storm
uweseiler
![Page 2: Introduction to Twitter Storm](https://reader033.vdocuments.pub/reader033/viewer/2022060108/554dd405b4c905cc0e8b49d5/html5/thumbnails/2.jpg)
Sankt Augustin24-25.08.2013 About me
Big Data Nerd
TravelpiratePhotography Enthusiast
Hadoop Trainer MongoDB Author
![Page 3: Introduction to Twitter Storm](https://reader033.vdocuments.pub/reader033/viewer/2022060108/554dd405b4c905cc0e8b49d5/html5/thumbnails/3.jpg)
Sankt Augustin24-25.08.2013 About us
is a bunch of…
Big Data Nerds Agile Ninjas Continuous Delivery Gurus
Enterprise Java Specialists Performance Geeks
Join us!
![Page 4: Introduction to Twitter Storm](https://reader033.vdocuments.pub/reader033/viewer/2022060108/554dd405b4c905cc0e8b49d5/html5/thumbnails/4.jpg)
Sankt Augustin24-25.08.2013 Agenda
• Why Twitter Storm?
• What is Twitter Storm?
• What to do with Twitter Storm?
![Page 5: Introduction to Twitter Storm](https://reader033.vdocuments.pub/reader033/viewer/2022060108/554dd405b4c905cc0e8b49d5/html5/thumbnails/5.jpg)
Sankt Augustin24-25.08.2013 The 3 V’s of Big Data
VarietyVolume Velocity
![Page 6: Introduction to Twitter Storm](https://reader033.vdocuments.pub/reader033/viewer/2022060108/554dd405b4c905cc0e8b49d5/html5/thumbnails/6.jpg)
Sankt Augustin24-25.08.2013 Velocity
![Page 7: Introduction to Twitter Storm](https://reader033.vdocuments.pub/reader033/viewer/2022060108/554dd405b4c905cc0e8b49d5/html5/thumbnails/7.jpg)
Sankt Augustin24-25.08.2013 Why Twitter Storm?
![Page 8: Introduction to Twitter Storm](https://reader033.vdocuments.pub/reader033/viewer/2022060108/554dd405b4c905cc0e8b49d5/html5/thumbnails/8.jpg)
Sankt Augustin24-25.08.2013 Batch vs. Real-Time processing
• Batch processing – Gathering of data and processing as a
group at one time.
• Real-time processing– Processing of data that takes place as the
information is being entered.
![Page 9: Introduction to Twitter Storm](https://reader033.vdocuments.pub/reader033/viewer/2022060108/554dd405b4c905cc0e8b49d5/html5/thumbnails/9.jpg)
Sankt Augustin24-25.08.2013 Lambda architecture
![Page 10: Introduction to Twitter Storm](https://reader033.vdocuments.pub/reader033/viewer/2022060108/554dd405b4c905cc0e8b49d5/html5/thumbnails/10.jpg)
Sankt Augustin24-25.08.2013 Bridging the gap…
• A batch workflow is too slow• Views are out of date
Absorbed into batch views
Time
Not Absorbed
Now
Just a few hoursof data
![Page 11: Introduction to Twitter Storm](https://reader033.vdocuments.pub/reader033/viewer/2022060108/554dd405b4c905cc0e8b49d5/html5/thumbnails/11.jpg)
Sankt Augustin24-25.08.2013 Storm vs. Hadoop
• Real-time processing
• Topologies run forever
• No SPOF• Stateless nodes
• Batch processing• Jobs run to
completion• NameNode is SPOF• Stateful nodes
• Scalable• Gurantees no dataloss
• Open Source
![Page 12: Introduction to Twitter Storm](https://reader033.vdocuments.pub/reader033/viewer/2022060108/554dd405b4c905cc0e8b49d5/html5/thumbnails/12.jpg)
Sankt Augustin24-25.08.2013 Stream Processing
Stream processing is a technical paradigm to process big volumes of unbound sequence of tuples in real-time
Source Stream Processing
• Algorithmic trading• Sensor data monitoring• Continuous analytics
![Page 13: Introduction to Twitter Storm](https://reader033.vdocuments.pub/reader033/viewer/2022060108/554dd405b4c905cc0e8b49d5/html5/thumbnails/13.jpg)
Sankt Augustin24-25.08.2013 Example: Stream of tweets
https://github.com/colinsurprenant/tweitgeist
![Page 14: Introduction to Twitter Storm](https://reader033.vdocuments.pub/reader033/viewer/2022060108/554dd405b4c905cc0e8b49d5/html5/thumbnails/14.jpg)
Sankt Augustin24-25.08.2013 Agenda
• Why Twitter Storm?
• What is Twitter Storm?
• What to do with Twitter Storm?
![Page 15: Introduction to Twitter Storm](https://reader033.vdocuments.pub/reader033/viewer/2022060108/554dd405b4c905cc0e8b49d5/html5/thumbnails/15.jpg)
Sankt Augustin24-25.08.2013 Welcome, Twitter Storm!
• Created by Nathan Marz @ BackType– Analyze tweets, links, users on Twitter
• Open sourced on 19th September, 2011– Eclipse Public License 1.0– Storm v0.5.2
• Latest Updates– Current stable release v0.8.2 released on 11th January,
2013– Major core improvements planned for v0.9.0– Storm will be an Apache Project [soon..]
![Page 16: Introduction to Twitter Storm](https://reader033.vdocuments.pub/reader033/viewer/2022060108/554dd405b4c905cc0e8b49d5/html5/thumbnails/16.jpg)
Sankt Augustin24-25.08.2013 Storm under the hood
• Java & Clojure
• Apache Thrift– Cross language bridge, RPC, Framework to build
services
• ZeroMQ– Asynchronous message transport layer
• Kryo– Serialization framework
• Jetty– Embedded web server
![Page 17: Introduction to Twitter Storm](https://reader033.vdocuments.pub/reader033/viewer/2022060108/554dd405b4c905cc0e8b49d5/html5/thumbnails/17.jpg)
Sankt Augustin24-25.08.2013 Conceptual view
Spout
Spout
Spout:Source of streams
Bolt
Bolt
Bolt
Bolt
Bolt
Bolt:Consumer of streams,Processing of tuples,Possibly emits new tuples
Tuple
Tuple
TupleTuple:
List of name-value pairs
Stream:Unbound sequence of tuples
Topology: Network of Spouts & Bolts as the nodes and stream as the edge
![Page 18: Introduction to Twitter Storm](https://reader033.vdocuments.pub/reader033/viewer/2022060108/554dd405b4c905cc0e8b49d5/html5/thumbnails/18.jpg)
Sankt Augustin24-25.08.2013 Physical view
Java thread spawned by worker, runs one or more tasks of the same component
Nimbus
ZooKeeper
WorkerSupervisor
Executor Task
ZooKeeper
ZooKeeper
Supervisor
Supervisor
Supervisor
Supervisor
Worker
Worker
Worker Node
Worker Process
Java process executing a subset of topology
Component (Spout/Bolt) instance, performs the actual data processing
Master daemon process
Responsible for• distributing code• assigning tasks• monitoring failures
Storing operational cluster state
Worker daemon process listening for work assigned to its node
![Page 19: Introduction to Twitter Storm](https://reader033.vdocuments.pub/reader033/viewer/2022060108/554dd405b4c905cc0e8b49d5/html5/thumbnails/19.jpg)
Sankt Augustin24-25.08.2013 A simple example: WordCount
FileReaderSpout
WordSplitBolt
WordCountBolt
line
shakespeare.txt
word
of: 18126to: 18763i: 19540and: 26099the: 27730
Sorted list
![Page 20: Introduction to Twitter Storm](https://reader033.vdocuments.pub/reader033/viewer/2022060108/554dd405b4c905cc0e8b49d5/html5/thumbnails/20.jpg)
Sankt Augustin24-25.08.2013 FileReaderSpout I
package de.codecentric.storm.wordcount.spouts;
import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.util.Map;
import backtype.storm.spout.SpoutOutputCollector;
import backtype.storm.task.TopologyContext;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.base.BaseRichSpout;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Values;
public class FileReaderSpout extends BaseRichSpout {
private SpoutOutputCollector collector;
private FileReader fileReader;
private boolean completed = false;
public void ack(Object msgId) {
System.out.println("OK:" + msgId);
}
public void fail(Object msgId) {
System.out.println("FAIL:" + msgId);
}
![Page 21: Introduction to Twitter Storm](https://reader033.vdocuments.pub/reader033/viewer/2022060108/554dd405b4c905cc0e8b49d5/html5/thumbnails/21.jpg)
Sankt Augustin24-25.08.2013 FileReaderSpout II
/**
* Declare the output field "line"
*/
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("line"));
}
/**
* We will read the file and get the collector object
*/
public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {
try {
this.fileReader = new FileReader(conf.get("wordsFile").toString());
} catch (FileNotFoundException e) {
throw new RuntimeException("Error reading file ["
+ conf.get("wordFile") + "]");
}
this.collector = collector;
}
public void close() {
}
![Page 22: Introduction to Twitter Storm](https://reader033.vdocuments.pub/reader033/viewer/2022060108/554dd405b4c905cc0e8b49d5/html5/thumbnails/22.jpg)
Sankt Augustin24-25.08.2013 FileReaderSpout III
/**
* The only thing that the methods will do is emit each file line
*/
public void nextTuple() {
/**
* The nextuple it is called forever, so if we have read the file we
* will wait and then return
*/
String str;
// Open the reader
BufferedReader reader = new BufferedReader(fileReader);
try {
// Read all lines
while ((str = reader.readLine()) != null) {
/**
* Emit each line as a value
*/
this.collector.emit(new Values(str), str);
}
} catch (Exception e) {
throw new RuntimeException("Error reading tuple", e);
} finally {
completed = true;
}
}
}
![Page 23: Introduction to Twitter Storm](https://reader033.vdocuments.pub/reader033/viewer/2022060108/554dd405b4c905cc0e8b49d5/html5/thumbnails/23.jpg)
Sankt Augustin24-25.08.2013 WordSplitBolt I
package de.codecentric.storm.wordcount.bolts;
import backtype.storm.topology.BasicOutputCollector;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.base.BaseBasicBolt;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Tuple;
import backtype.storm.tuple.Values;
public class WordSplitBolt extends BaseBasicBolt {
public void cleanup() {}
/**
* The bolt will only emit the field "word"
*/
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word"));
}
![Page 24: Introduction to Twitter Storm](https://reader033.vdocuments.pub/reader033/viewer/2022060108/554dd405b4c905cc0e8b49d5/html5/thumbnails/24.jpg)
Sankt Augustin24-25.08.2013 WordSplitBolt II
/**
* The bolt will receive the line from the
* words file and process it to split it into words
*/
public void execute(Tuple input, BasicOutputCollector collector) {
String sentence = input.getString(0);
String[] words = sentence.split(" ");
for(String word : words){
word = word.trim();
if(!word.isEmpty()){
word = word.toLowerCase();
collector.emit(new Values(word));
}
}
}
![Page 25: Introduction to Twitter Storm](https://reader033.vdocuments.pub/reader033/viewer/2022060108/554dd405b4c905cc0e8b49d5/html5/thumbnails/25.jpg)
Sankt Augustin24-25.08.2013 WordCountBolt I
package de.codecentric.storm.wordcount.bolts;
import java.util.Comparator;
import java.util.HashMap;
import java.util.Map;
import java.util.SortedSet;
import java.util.TreeSet;
import backtype.storm.task.TopologyContext;
import backtype.storm.topology.BasicOutputCollector;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.base.BaseBasicBolt;
import backtype.storm.tuple.Tuple;
public class WordCountBolt extends BaseBasicBolt {
/**
*
*/
private static final long serialVersionUID = 1L;
Integer id;
String name;
Map<String, Integer> counters;
![Page 26: Introduction to Twitter Storm](https://reader033.vdocuments.pub/reader033/viewer/2022060108/554dd405b4c905cc0e8b49d5/html5/thumbnails/26.jpg)
Sankt Augustin24-25.08.2013 WordCountBolt II
/**
* On create
*/
@Override
public void prepare(Map stormConf, TopologyContext context) {
this.counters = new HashMap<String, Integer>();
this.name = context.getThisComponentId();
this.id = context.getThisTaskId();
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
}
@Override
public void execute(Tuple input, BasicOutputCollector collector) {
String str = input.getString(0);
/**
* If the word doesn't exist in the map we will create this, if not we will add 1
*/
if (!counters.containsKey(str)) {
counters.put(str, 1);
} else {
Integer c = counters.get(str) + 1;
counters.put(str, c);
}
}
![Page 27: Introduction to Twitter Storm](https://reader033.vdocuments.pub/reader033/viewer/2022060108/554dd405b4c905cc0e8b49d5/html5/thumbnails/27.jpg)
Sankt Augustin24-25.08.2013 WordCountBolt III
/**
* At the end of the spout (when the cluster is shutdown we will show the
* word counters
*/
@Override
public void cleanup() {
// Sort map
SortedSet<Map.Entry<String, Integer>> sortedCounts = entriesSortedByValues(counters);
System.out.println("-- Word Counter [" + name + "-" + id + "] --");
for (Map.Entry<String, Integer> entry : sortedCounts) {
System.out.println(entry.getKey() + ": " + entry.getValue());
}
}
…
}
![Page 28: Introduction to Twitter Storm](https://reader033.vdocuments.pub/reader033/viewer/2022060108/554dd405b4c905cc0e8b49d5/html5/thumbnails/28.jpg)
Sankt Augustin24-25.08.2013 WordCountTopology
public class WordCountTopology {
public static void main(String[] args) throws InterruptedException {
// Topology definition
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("word-reader",new FileReaderSpout());
builder.setBolt("word-normalizer", new WordSplitBolt())
.shuffleGrouping("word-reader");
builder.setBolt("word-counter", new WordCountBolt(),1)
.fieldsGrouping("word-normalizer", new Fields("word"));
// Configuration
Config conf = new Config();
conf.put("wordsFile", args[0]);
conf.setDebug(false);
// Run Topology
conf.put(Config.TOPOLOGY_MAX_SPOUT_PENDING, 1);
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("word-count-topology", conf, builder.createTopology());
// You don‘t do this on a regular topology
Utils.sleep(10000);
cluster.killTopology("word-count-topology");
cluster.shutdown();
}
}
![Page 29: Introduction to Twitter Storm](https://reader033.vdocuments.pub/reader033/viewer/2022060108/554dd405b4c905cc0e8b49d5/html5/thumbnails/29.jpg)
Sankt Augustin24-25.08.2013 Stream Grouping
• Each Spout or Bolt might be running n instances in parallel
• Groupings are used to decide to which task in the subscribing bolt (group) a tuple is sent to.
• Possible Groupings:
Grouping FeatureShuffle Random grouping
Fields Grouped by value such that equal value results in same task
All Replicates to all tasks
Global Makes all tuples go to one task
None Makes Bolt run in the same thread as the Bolt / Spout it subscribes to
Direct Producer (task that emits) controls which Consumer will receive
Local If the target bolt has one or more tasks in the same worker process, tuples will be shuffled to just those in-process tasks
![Page 30: Introduction to Twitter Storm](https://reader033.vdocuments.pub/reader033/viewer/2022060108/554dd405b4c905cc0e8b49d5/html5/thumbnails/30.jpg)
Sankt Augustin24-25.08.2013 Key features of Twitter Storm
Storm is• Fast & scalable• Fault-tolerant• Guaranteeing message processing• Easy to setup & operate• Free & Open Source
![Page 31: Introduction to Twitter Storm](https://reader033.vdocuments.pub/reader033/viewer/2022060108/554dd405b4c905cc0e8b49d5/html5/thumbnails/31.jpg)
Sankt Augustin24-25.08.2013 Key features of Twitter Storm
Storm is• Fast & scalable• Fault-tolerant• Guaranteeing message processing• Easy to setup & operate• Free & Open Source
![Page 32: Introduction to Twitter Storm](https://reader033.vdocuments.pub/reader033/viewer/2022060108/554dd405b4c905cc0e8b49d5/html5/thumbnails/32.jpg)
Sankt Augustin24-25.08.2013 Extremely performant
![Page 33: Introduction to Twitter Storm](https://reader033.vdocuments.pub/reader033/viewer/2022060108/554dd405b4c905cc0e8b49d5/html5/thumbnails/33.jpg)
Sankt Augustin24-25.08.2013 Parallelism
Number of worker nodes = 2Number of worker slots per node = 4Number of topology worker = 4
FileReaderSpout WordSplitBolt WordCountBolt
Number of tasks = Not specified = Same as parallism hint
Parellism_hint = 2
Number of tasks = 8
Parellism_hint = 4
Number of tasks = Not specified = 6
Parellism_hint = 6
Number of component instances = 2 + 8 + 6 = 16Number of executor threads = 2 + 4 + 6 = 12
![Page 34: Introduction to Twitter Storm](https://reader033.vdocuments.pub/reader033/viewer/2022060108/554dd405b4c905cc0e8b49d5/html5/thumbnails/34.jpg)
Sankt Augustin24-25.08.2013 Message passing
ReceiveThread
Executor
Transfer ThreadExecutor
Executor
Receiver queue
To other workers
From other workers
Internal transfer queue
Transfer queue
Interprocess communication is mediated by ZeroMQOutside transfer is done with Kryo serialization
Local communication is mediated by LMAX DisruptorInside transfer is done with no serialization
![Page 35: Introduction to Twitter Storm](https://reader033.vdocuments.pub/reader033/viewer/2022060108/554dd405b4c905cc0e8b49d5/html5/thumbnails/35.jpg)
Sankt Augustin24-25.08.2013 Key features of Twitter Storm
Storm is• Fast & scalable• Fault-tolerant• Guaranteeing message processing• Easy to setup & operate• Free & Open Source
![Page 36: Introduction to Twitter Storm](https://reader033.vdocuments.pub/reader033/viewer/2022060108/554dd405b4c905cc0e8b49d5/html5/thumbnails/36.jpg)
Sankt Augustin24-25.08.2013 Fault tolerance
Nimbus ZooKeeper Supervisor Worker
Cluster works normally
Monitoringcluster state
Synchronizingassignment
Sending heartbeat
Reading worker heartbeat from local filesystem
Sending executor heartbeat
![Page 37: Introduction to Twitter Storm](https://reader033.vdocuments.pub/reader033/viewer/2022060108/554dd405b4c905cc0e8b49d5/html5/thumbnails/37.jpg)
Sankt Augustin24-25.08.2013 Fault tolerance
Nimbus ZooKeeper Supervisor Worker
Nimbus goes down
Monitoringcluster state
Synchronizingassignment
Sending heartbeat
Reading worker heartbeat from local filesystem
Sending executor heartbeat
Processing will still continue. But topology lifecycle operations and reassignment facility are lost
![Page 38: Introduction to Twitter Storm](https://reader033.vdocuments.pub/reader033/viewer/2022060108/554dd405b4c905cc0e8b49d5/html5/thumbnails/38.jpg)
Sankt Augustin24-25.08.2013 Fault tolerance
Nimbus ZooKeeper Supervisor Worker
Worker node goes down
Monitoringcluster state
Sending executor heartbeat
Nimbus will reassign the tasks to other machines and the processing will continue
Supervisor Worker
Synchronizingassignment
Sending heartbeat
Reading worker heartbeat from local filesystem
![Page 39: Introduction to Twitter Storm](https://reader033.vdocuments.pub/reader033/viewer/2022060108/554dd405b4c905cc0e8b49d5/html5/thumbnails/39.jpg)
Sankt Augustin24-25.08.2013 Fault tolerance
Nimbus ZooKeeper Supervisor Worker
Supervisor goes down
Monitoringcluster state
Synchronizingassignment
Sending heartbeat
Reading worker heartbeat from local filesystem
Sending executor heartbeat
Processing will still continue. But assignment is never synchronized
![Page 40: Introduction to Twitter Storm](https://reader033.vdocuments.pub/reader033/viewer/2022060108/554dd405b4c905cc0e8b49d5/html5/thumbnails/40.jpg)
Sankt Augustin24-25.08.2013 Fault tolerance
Nimbus ZooKeeper Supervisor Worker
Worker process goes down
Monitoringcluster state
Synchronizingassignment
Sending heartbeat
Reading worker heartbeat from local filesystem
Sending executor heartbeat
Supervisor will restart the worker process and the processing will continue
![Page 41: Introduction to Twitter Storm](https://reader033.vdocuments.pub/reader033/viewer/2022060108/554dd405b4c905cc0e8b49d5/html5/thumbnails/41.jpg)
Sankt Augustin24-25.08.2013 Key features of Twitter Storm
Storm is• Fast & scalable• Fault-tolerant• Guaranteeing message processing• Easy to setup & operate• Free & Open Source
![Page 42: Introduction to Twitter Storm](https://reader033.vdocuments.pub/reader033/viewer/2022060108/554dd405b4c905cc0e8b49d5/html5/thumbnails/42.jpg)
Sankt Augustin24-25.08.2013 Reliability API
public class FileReaderSpout extends BaseRichSpout {
public void nextTuple() {
…;
UUID messageID = getMsgID();
collector.emit(newValues(line), msgId)
}
public void ack(Object msgId) {
// Do something with acked message id
}
public void fail(Object msgId) {
// Do something with failes message id
}
}
public class WordSplitBolt extends BaseBasicBolt {
public void execute(Tuple input, BasicOutputCollector collector) {
for (String s : input.getString(0).split("\\s")) {
collector.emit(input, newValues(s));
}
collector.ack(input);
}
}
Tupel tree
Anchoring incoming tuple to outgoing tuplesSending ack
This
“This is a line”
This
This
This
Emiting tuple with Message ID
![Page 43: Introduction to Twitter Storm](https://reader033.vdocuments.pub/reader033/viewer/2022060108/554dd405b4c905cc0e8b49d5/html5/thumbnails/43.jpg)
Sankt Augustin24-25.08.2013 ACKing Framework
ACKer init
FileReaderSpout WordSplitBolt WordCountBolt
ACKer implicit boltACKer ack
ACKer failACKer ackACKer fail
Tuple A
Tuple B
Tuple C
• Emitted tuple A, XOR tuple A id with ack val• Emitted tuple B, XOR tuple B id with ack val• Emitted tuple C, XOR tuple C id with ack val• Acked tuple A, XOR tuple A id with ack val• Acked tuple B, XOR tuple B id with ack val• Acked tuple C, XOR tuple C id with ack val
Spout Tuple ID Spout Task ID ACK val (64 Bit)
ACKer implizit boltACK val has become 0, ACKer implicit bolt knows the tuple tree has been completed
![Page 44: Introduction to Twitter Storm](https://reader033.vdocuments.pub/reader033/viewer/2022060108/554dd405b4c905cc0e8b49d5/html5/thumbnails/44.jpg)
Sankt Augustin24-25.08.2013 Key features of Twitter Storm
Storm is• Fast & scalable• Fault-tolerant• Guaranteeing message processing• Easy to setup & operate• Free & Open Source
![Page 45: Introduction to Twitter Storm](https://reader033.vdocuments.pub/reader033/viewer/2022060108/554dd405b4c905cc0e8b49d5/html5/thumbnails/45.jpg)
Sankt Augustin24-25.08.2013 Cluster Setup
• Setup ZooKeeper cluster
• Install dependencies on Nimbus and worker machines– ZeroMQ 2.1.7 and JZMQ– Java 6 and Python 2.6.6– unzip
• Download and extract a Storm release to Nimbus and worker machines
• Fill in mandatory configuration into storm.yaml
• Launch daemons under supervision using storm scripts
• Start a topology:
– storm jar <path_topology_jar> <main_class> <arg1>…<argN>
![Page 46: Introduction to Twitter Storm](https://reader033.vdocuments.pub/reader033/viewer/2022060108/554dd405b4c905cc0e8b49d5/html5/thumbnails/46.jpg)
Sankt Augustin24-25.08.2013 Cluster Summary
![Page 47: Introduction to Twitter Storm](https://reader033.vdocuments.pub/reader033/viewer/2022060108/554dd405b4c905cc0e8b49d5/html5/thumbnails/47.jpg)
Sankt Augustin24-25.08.2013 Topology Summary
![Page 48: Introduction to Twitter Storm](https://reader033.vdocuments.pub/reader033/viewer/2022060108/554dd405b4c905cc0e8b49d5/html5/thumbnails/48.jpg)
Sankt Augustin24-25.08.2013 Component Summary
![Page 49: Introduction to Twitter Storm](https://reader033.vdocuments.pub/reader033/viewer/2022060108/554dd405b4c905cc0e8b49d5/html5/thumbnails/49.jpg)
Sankt Augustin24-25.08.2013 Key features of Twitter Storm
Storm is• Fast & scalable• Fault-tolerant• Guaranteeing message processing• Easy to setup & operate• Free & Open Source
![Page 50: Introduction to Twitter Storm](https://reader033.vdocuments.pub/reader033/viewer/2022060108/554dd405b4c905cc0e8b49d5/html5/thumbnails/50.jpg)
Sankt Augustin24-25.08.2013 Basic resources
• Storm is available at– http://storm-project.net/– https://github.com/nathanmarz/storm
under Eclipse Public License 1.0
• Get help on– http://groups.google.com/group/storm-user
– #storm-user freenode room
• Follow@stormprocessor and @nathanmarz
![Page 51: Introduction to Twitter Storm](https://reader033.vdocuments.pub/reader033/viewer/2022060108/554dd405b4c905cc0e8b49d5/html5/thumbnails/51.jpg)
Sankt Augustin24-25.08.2013 Many contributions
• Community repository for modules to use Storm at– https://github.com/nathanmarz/storm-contrib– including integration with Redis, Kafka, MongoDB, HBase, JMS,
Amazon SQS, …
• Good articles for understanding Storm internals– http://www.michael-noll.com/blog/2012/10/16/understanding-the-
parallelism-of-a-stormtopology/– http://www.michael-noll.com/blog/2013/06/21/understanding-storm-
internal-messagebuffers/
• Good slides for understanding real-life examples– http://www.slideshare.net/DanLynn1/storm-as-deep-into-
realtime-data-processing-as-youcan-get-in-30-minutes– http://www.slideshare.net/KrishnaGade2/storm-at-twitter
![Page 52: Introduction to Twitter Storm](https://reader033.vdocuments.pub/reader033/viewer/2022060108/554dd405b4c905cc0e8b49d5/html5/thumbnails/52.jpg)
Sankt Augustin24-25.08.2013 Coming next…
• Current release: 0.8.2
• Work in progress (newest): 0.9.0-wip21– SLF4J and Logback– Pluggable tuple serialization and blowfish
encryption– Pluggable interprocess messaging and Netty
implementation– Some bug fixes– And more
• Storm on YARN
![Page 53: Introduction to Twitter Storm](https://reader033.vdocuments.pub/reader033/viewer/2022060108/554dd405b4c905cc0e8b49d5/html5/thumbnails/53.jpg)
Sankt Augustin24-25.08.2013 Agenda
• Why Twitter Storm?
• What is Twitter Storm?
• What to do with Twitter Storm?
![Page 54: Introduction to Twitter Storm](https://reader033.vdocuments.pub/reader033/viewer/2022060108/554dd405b4c905cc0e8b49d5/html5/thumbnails/54.jpg)
Sankt Augustin24-25.08.2013 One example: Webshop
• Webtracking component
• No defined page impression
• Identifying page impressions usingVarnish logs of the click stream data
• Page consists of different fragments– Body– Article description– Recommendation box, …
• Session data also of interest
![Page 55: Introduction to Twitter Storm](https://reader033.vdocuments.pub/reader033/viewer/2022060108/554dd405b4c905cc0e8b49d5/html5/thumbnails/55.jpg)
Sankt Augustin24-25.08.2013 One example: Webshop
• Custom solution using J2EE andMongoDB
• Export into Comscore DAx andEnterprise DWH
• Solution is currently working but not scalable
• What about performance?
![Page 56: Introduction to Twitter Storm](https://reader033.vdocuments.pub/reader033/viewer/2022060108/554dd405b4c905cc0e8b49d5/html5/thumbnails/56.jpg)
Sankt Augustin24-25.08.2013 Topology Architecture