complex event processing with esper
DESCRIPTION
Talk I gave at Codebits 2011 on 11/11/11 about Complex Event Processing using Esper.TRANSCRIPT
Complex Event Processing with Esper
@antonioalegria
CEP
Complex Event Processing?
“Complex Event is an event that could only happen if lots of other events happened”
“CEP is a set of tools and techniques for analyzing and controlling the complex series
of interrelated events that drive modern distributed information
systems”
David Luckham, 2002
Example
• Church bell ringing
• Appearance of a man in a tuxedo
• Appearance of a woman in a white gown
• Rice flying through the air
Example
• Church bell ringing
• Appearance of a man in a tuxedo
• Appearance of a woman in a white gown
• Rice flying through the air
Wedding has happened!
CEP Use Cases
• Are our business processes running on time and correctly?
• Can we detect an opportunity for arbitrage in our trading department?
• Are we servicing our call center customer’s requests in a timely fashion?
• Was there a breach in our network?
It’s not a technology
like SOA!
It’s a Buzzword
It’s an Architectural Pattern
What do you need for CEP?
Event driven
(soft) Real-time
(soft) Real-timeRight
Across all layers of organization
Event Aggregation
Event Relationships
• Causality
• Membership
• Timing
Event Patterns
for Event Processing
Domain Specific Language
What you need for CEP
• Event Driven
• Right-time
• Across all layers
• Aggregation, Correlation & Traceability
• Patterns
• DSL
Common CEP Operations
• Windowing
• Transformation
• Aggregation/Grouping
• Merging/Union
• Filtering
• Sorting
• Correlation
• Pattern Detection
Esper makes it easier to build a CEP app
Not meant to replace Databases
But some parallels can be made
• Stores data
• On-demand queries
• Time is a data type
DBEsper
• Stores queries
• Continuous queries
• Time is a dimension
• SQL
• Tables
• Rows
DBEsper
• EPL
• Event Streams
• Events
Esper Processing Model
EPLEvent Processing Language
Event Definition (1/2)
create schema Event ( id string, // Event unique identifier ts long // Timestamp (milliseconds));
create schema Tweet ( user string,// username (e.g. ‘codebits’) text string,// actual tweet retweet_of string // references a Tweet.id) inherits Event;
Event Definition (2/2)
create schema Hashtag ( tweet_id string, // references a Tweet.id user string, value string) inherits Event;
// Create Url and Mention event types as a copy of Hashtag
create schema Url() copyfrom Hashtag;
create schema Mention() copyfrom Hashtag;
Looks like SQL...
// All eventsselect * from Event;
// Only tweetsselect user, text as statusfrom Tweet;
Filtering
// Tweets from @codebitsselect * from Tweet(user = 'codebits');
// Another way to do itselect * from Tweet where user = 'codebits';
// All occurrences of #codebits not posted by @codebitsselect user, value as hashtag, current_timestamp() as tsfrom Hashtag(value = 'codebits' and user != 'codebits');
Stream Creation and Redirection
insert into CodebitsTweetsselect * from Tweet(user = ‘codebits’);
select * from CodebitsTweets;
Aggregation
insert into UrlsPerSecondselect count(*) as count from Url.win:time_batch(1 sec);
// Every second (driven by above rule) calculate for last minute// - average Urls tweeted// - total Urls tweetedselect avg(count), sum(count)from UrlsPerSecond.win:length(60);
Grouping
select value as hashtag, count(*)from Hashtag(value != null).win:time(30 seconds)group by value;
Simple Event Views
select * from Tweet.win:time(5 min);
select * from Tweet.win:time_batch(1 hour);
select * from Tweet.win:length(10);
select * from Tweet.win:length_batch(10);
Other Standard Event Views
// Don’t use system clock, use event stream propertyselect * from Tweet.win:ext_timed(ts, 5 min);
// Last 10 tweets per userselect * from Tweet.std:groupwin(user).win:length(10);
// Top 5 Hashtagsselect * from HashtagsPerMinute.std:sort(5, count desc);
You can create your own custom Views
Correlation
// Associate hashtags used to describe a URLinsert into UrlTagsselect u.value as url, h.value as hashtagfrom Url.std:lastevent() as u, Hashtag.std:lastevent() as hwhere u.tweet_id = h.tweet_id;
insert into UrlTagsCountselect url, hashtag, count(*) as countfrom UrlTags.win:time(1 hour)group by url, hashtag;
Correlation (1/2)
// Every minute, output Top 3 hashtags per URLselect * from UrlTagsCount.ext:sort(3, count desc)output snapshot at(*/1,*,*,*,*);
Event Patterns
// Measure how long it takes users to respond to Tweetinsert into ResponseDelayselect t.id as tweet_id, t.user as author, m.value as responder, t.ts as start_ts, m.ts as stop_ts, m.ts - t.ts as durationfrom pattern [ every (t=Tweet -> m=Mention(value = t.user))];
Detecting Missing Events
// No Tweet from @codebits in 1 hourselect *from pattern [ every Tweet(user = ‘codebits’) -> (timer:interval(1 hour) and not Tweet(user = ‘codebits’))];
Other features
• Subqueries
• Inner, outer joins
• Named windows
• 1st class integration with databases (JDBC)
• Regex-like Event Pattern matching (match-recognize)
Esper is awesome!
well, duh!
It’s not a silver bullet
Memory Usage
Resilience & Persistence
Weak Pattern matching
Drill-down not trivial
It’s NOT distributed!
Not full-stack
For more: @antonioalegria
QA