real-time text analytics volker stümpflen for event detection in … · form together approx. 50%...

22
Real-Time Text Analytics for Event Detection in the Financial World Gaining value from Big Data Volker Stümpflen April 2015 Winner

Upload: others

Post on 21-Jan-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Real-Time Text Analytics Volker Stümpflen for Event Detection in … · Form together approx. 50% of all events ! „Black Swans“ ! ... Context Free Grammars (CFGs) and pattern

Real-Time Text Analytics for Event Detection in the Financial World Gaining value from Big Data

Volker Stümpflen

April 2015

Winner

Page 2: Real-Time Text Analytics Volker Stümpflen for Event Detection in … · Form together approx. 50% of all events ! „Black Swans“ ! ... Context Free Grammars (CFGs) and pattern

Information Delay - A Big Data Problem Markets are driven by news (and sentiments)

� Loss of S&P 500 alone totaled $136.5 billion within six minutes �  Cost of a second: $380 million

Hacked AP tweet

Page 3: Real-Time Text Analytics Volker Stümpflen for Event Detection in … · Form together approx. 50% of all events ! „Black Swans“ ! ... Context Free Grammars (CFGs) and pattern

Use Case Baader Bank AG

� A leading German investment bank, market maker, sales trader � Missed relevant information in real-time

3

780.000 financial instruments

N traders and analysts

500.000 news p.d. ~4 bn sentences p.a.

From stocks to derivatives Increasing

Decreasing time for increasing information Is constant and small

From news agencies to social media channels Strongly increasing

Page 4: Real-Time Text Analytics Volker Stümpflen for Event Detection in … · Form together approx. 50% of all events ! „Black Swans“ ! ... Context Free Grammars (CFGs) and pattern

Market Moving Event Types From news corpus Big Data analytics

� Reuters 2008 news corpus � S&P 500 companies

� Price change >= 1% in less than 1 minute

4

Event Rel Freq.

CDS Price Move" 1"

Analyst Forecast" 1"

Business Climate Change" 1"

CEO Search" 1"

Company Forecast" 1"

Customer Problems" 1"

Debt Financing" 1"

Equity Financing" 1"

Fraud Investigation" 1"

Government Decision (no bailout)"

1"

Incorporation Change" 1"

Legal Settlement" 1"

M&A" 1"

Restructuring" 1"

Supply Chain" 1"

Trading Halt" 1"

Asset Liquidation" 2"

Stocks Fall (Peers)" 2"

Dividend Change! 3!

Broker Rating! 9!

Quarterly Results! 10!

Page 5: Real-Time Text Analytics Volker Stümpflen for Event Detection in … · Form together approx. 50% of all events ! „Black Swans“ ! ... Context Free Grammars (CFGs) and pattern

Three Information Classes

� Common events � E.g. “U.K. Economy Grew 0.6% in Fourth Quarter, Revised From 0.5%“

� Structured and accesible with simple RegEx �  Immediately in the market

� „Grey Swans“ � E.g. „The Swiss National Bank scrapped the cap on the Franc.“

� Expected market movers but individual event is less frequent

� Form together approx. 50% of all events

� „Black Swans“

� E.g. „Zombie virus epedemic“ � Unexpected market movers and typically catastrophes

5

Page 6: Real-Time Text Analytics Volker Stümpflen for Event Detection in … · Form together approx. 50% of all events ! „Black Swans“ ! ... Context Free Grammars (CFGs) and pattern

The Way We Look at Events Information is always connected

� Economy consist of � Entities like companies, people, products, locations, catastropes, ...

� Events occur if entities doing something with each other

� The simplest event is � Entity A – is doing something with – Entity B

� Complex events are � Superpositions of simple events

� And/or indirect effects of simple events (guilty by association)

6

Page 7: Real-Time Text Analytics Volker Stümpflen for Event Detection in … · Form together approx. 50% of all events ! „Black Swans“ ! ... Context Free Grammars (CFGs) and pattern

A Recent Example

7

Cap on franc

crapped

FXCM had serious losses

Page 8: Real-Time Text Analytics Volker Stümpflen for Event Detection in … · Form together approx. 50% of all events ! „Black Swans“ ! ... Context Free Grammars (CFGs) and pattern

Networks – The Most Natural Way to Look At It

8

Page 9: Real-Time Text Analytics Volker Stümpflen for Event Detection in … · Form together approx. 50% of all events ! „Black Swans“ ! ... Context Free Grammars (CFGs) and pattern

Networks With Predicate Argument Structures

9

Swiss National Bank

the cap on the franc

Customers

FXCM

225 million francs

scrapped

owe

=

Page 10: Real-Time Text Analytics Volker Stümpflen for Event Detection in … · Form together approx. 50% of all events ! „Black Swans“ ! ... Context Free Grammars (CFGs) and pattern

Wanted

� Software that ... � extracts PAS out of raw unstructured text � is fast (PAS after few ms per sentence) � has high precision for agent, patient, beneficiary � is easy to extend for domain-specific language

e.g. *EHEALTH SEES YEAR ADJ. EPS 34C-41, EST. 38C� is inherently multi-lingual

� Nothing available did all of that – so we built our own!

10

Page 11: Real-Time Text Analytics Volker Stümpflen for Event Detection in … · Form together approx. 50% of all events ! „Black Swans“ ! ... Context Free Grammars (CFGs) and pattern

Preprocessing

11

The SNB scrapped the cap on the franc. Markets are stunned.

Sentence 1 Sentence 2

The SNB scrapped the cap on the franc Markets are stunned

Sentence Splitting

Tokenization

Part-of-speech tagging

The SNB scrapped the cap on the franc Markets are stunned

DT NNP VBD DT NN IN DT NN NNS VBP

VBN

noun verb

Page 12: Real-Time Text Analytics Volker Stümpflen for Event Detection in … · Form together approx. 50% of all events ! „Black Swans“ ! ... Context Free Grammars (CFGs) and pattern

Known Concept Identification

�  Includes named entity recognition (NER)

� Uses machine learning techniques, Context Free Grammars (CFGs) and pattern matching

� Easy to extend

12

Customers owe FXCM approx. 225 million francs .

to owe

owe/owes/owing/owed/owed Takes arg0, arg1, arg2

FXCM

Forex Capital Markets Ltd NYSE: FXCM

Currency Value

Value: 225 000 000 Currency: Swiss Franc

Page 13: Real-Time Text Analytics Volker Stümpflen for Event Detection in … · Form together approx. 50% of all events ! „Black Swans“ ! ... Context Free Grammars (CFGs) and pattern

Chunk Parsing

�  A chunk is (working definition) a sequence of consecutive tokens grouped by some notion of syntactic or semantic function or dependency.

13

DT The

NN decision

IN of

DT the

NNP SNB

RB greatly

VBD surprised

DT the

NNS markets

NOUN CHUNK NOUN CHUNK NOUN CHUNK VERB CHUNK

TOKEN TOKEN TK TK TK TOKEN TOKEN TK TOKEN Start: 1-token chunks

Apply CFG

Chu

nkin

g

Page 14: Real-Time Text Analytics Volker Stümpflen for Event Detection in … · Form together approx. 50% of all events ! „Black Swans“ ! ... Context Free Grammars (CFGs) and pattern

Mixed Semantic and Syntactical Analysis

�  Semantic concepts that are recognized during chunking are attached to special “Known Concept” chunks

�  Subsequent CFGs can recursively check for “is a company” etc.

14

NNP Forex

NNP Capital

NNP Markets

NNP Ltd

KNOWN CONCEPT

TOKEN TOKEN TK TOKEN

NOUN CHUNK

Chu

nkin

g

FXCM

Forex Capital Markets Ltd NYSE: FXCM

Page 15: Real-Time Text Analytics Volker Stümpflen for Event Detection in … · Form together approx. 50% of all events ! „Black Swans“ ! ... Context Free Grammars (CFGs) and pattern

Detecting Predicate-Argument Structures

15

DT The

NN decision

IN of

DT the

NNP SNB

RB greatly

VBD surprised

DT the

NNS markets

NOUN CHUNK NOUN CHUNK NOUN CHUNK VERB CHUNK

TOKEN TOKEN TK TK TK TOKEN TOKEN TK TOKEN Start: 1-token chunks

Apply CFG Chu

nkin

g

CONCEPT C Known concepts

The decision of the SNB surprised the markets detected PAS agent patient

Page 16: Real-Time Text Analytics Volker Stümpflen for Event Detection in … · Form together approx. 50% of all events ! „Black Swans“ ! ... Context Free Grammars (CFGs) and pattern

Implemented in Scala/Akka A horizontally scalable real-time solution

16

SRV 1 SRV 2

SRV 3 SRV 4

AMQP

AMQP

Reuters

Bloomberg

AMQPFetcher

AMQPFetcher

fetch

fetch

Frontend

Frontend

failover

Internet

Baader

ClusterNode

ClusterNode ClusterNode

ClusterNode

ElasticSearch DB ElasticSearch DB

ElasticSearch DBElasticSearch DB

ServicesServices

Page 17: Real-Time Text Analytics Volker Stümpflen for Event Detection in … · Form together approx. 50% of all events ! „Black Swans“ ! ... Context Free Grammars (CFGs) and pattern

Applications

17

Page 18: Real-Time Text Analytics Volker Stümpflen for Event Detection in … · Form together approx. 50% of all events ! „Black Swans“ ! ... Context Free Grammars (CFGs) and pattern

Real-Time Event Detection

18

Page 19: Real-Time Text Analytics Volker Stümpflen for Event Detection in … · Form together approx. 50% of all events ! „Black Swans“ ! ... Context Free Grammars (CFGs) and pattern

Mood Propagation Networks Systemic Mood

�  Inferring sentiment channels

� Calculating positive and negative sentiment flow

�  Similar to metabolic networks in biology

19

Samsung

Microsoft

Sony Google

Motorola

XYZ

China

Rare Earths

Foxconn

Apple

Foxconn

Sony Google

Motorola

XYZ

legal action Samsung

Apple sues Samsung

in Australia

ACTING COMPANY NEGATIVE RELATION RECEIVING COMPANY

LOCATION OF RELATION Arbitrary example

Page 20: Real-Time Text Analytics Volker Stümpflen for Event Detection in … · Form together approx. 50% of all events ! „Black Swans“ ! ... Context Free Grammars (CFGs) and pattern

Systemic Mood – The Fukushima Example „Activation“ of renewable energy companies

20

Before 3/11/2011: No interest in renewable energies

After 3/11/2011

Page 21: Real-Time Text Analytics Volker Stümpflen for Event Detection in … · Form together approx. 50% of all events ! „Black Swans“ ! ... Context Free Grammars (CFGs) and pattern

Conclusion

� Real-time news analytics system

� PAS pipeline based on

� machine learning techniques, Context Free Grammars (CFGs) and pattern matching

� Native transformation into associative network

� Utilized e.g. to infer sentiment propagation

� Benefits for Baader Bank AG

� Comprehensive news analytics

� Smart market moving events

� Reduced completely the losses due to missed news

� Substantial increased trading profit

21

Page 22: Real-Time Text Analytics Volker Stümpflen for Event Detection in … · Form together approx. 50% of all events ! „Black Swans“ ! ... Context Free Grammars (CFGs) and pattern

Clueda AG

Contact

22

Dr. Volker Stümpflen

T +49 89 4161402 10

M +49 0176 57 288282

[email protected]

Clueda AG

Elsenheimerstraße 59

D-80687 Munich

www.clueda.com