kevin o'dell - fraud and event detection using the enterprise data hub
TRANSCRIPT
Fraud Detection Using Cloudera EDH Fraud Detection
Kevin O’Dell/Field Engineer
3 © 2014 Cloudera, Inc. All rights reserved. CONFIDENTIAL—DO NOT DISTRIBUTE
Agenda • Overview of Problem • Offline Fraud • Online Fraud • Discussion
6 © 2014 Cloudera, Inc. All rights reserved. CONFIDENTIAL—DO NOT DISTRIBUTE
Problem Statement • About $15B annually – Card only • 1.3B Credit, Debit and Pre-paid cards – 5 per adult. • Omni-channel – Debit, Credit, Online, PoS, Deposits • Increasing Regulatory / Compliance Requirements • How do we integrate multiple systems and sources
13 © 2014 Cloudera and/or its affiliates. All rights reserved.
Fraud Systems
14 © 2014 Cloudera, Inc. All rights reserved. CONFIDENTIAL—DO NOT DISTRIBUTE
Key Requirements • No data loss is acceptable • Stream processing must complete ASAP, <500ms • Support approximately 400M transactions per day in aggregate
• Highest Volume Flow: • Current – 1.8k transactions/s • Projected – 10k transactions/s
• Each flow has at least three steps • Adapter, Persistence, Hadoop Persistence • Most complex with approximately seven steps
• Avoid massive code refactoring
15 © 2014 Cloudera, Inc. All rights reserved. CONFIDENTIAL—DO NOT DISTRIBUTE
Fraud System Categories • Online
• Ingest • Enrichment (Profiles, feature selection, etc.) • Early warning / detection (model serving / model application) • Persistence
• Offline (Human activities) • Model building / discovery • Case management • Forensics
16 © 2014 Cloudera, Inc. All rights reserved. CONFIDENTIAL—DO NOT DISTRIBUTE
Some Numbers • Approximately 400M events (not necessarily transactions) per day • Analysts get data within 5 minutes (Approx 90M/day) • Over 100 Source Systems • Offline system rolls files every 5 minutes • Online system processes transaction authorization flows in < 500ms.
17 © 2014 Cloudera and/or its affiliates. All rights reserved.
Architecture
19 © 2014 Cloudera, Inc. All rights reserved. CONFIDENTIAL—DO NOT DISTRIBUTE
Client Client Incoming Events Cloudera EDH
Automated & Manual
Analytical Adjustments and Pattern detection
Kafka Cluster
Topic A
Topic B
Topic C Client Client
Outgoing Events
Storage
HDFS
SolR
HBase
Event Processing Interactivity
HBase
Search
Serving Layer
Rules Engine
Model Building
Speed Layer
Batch Layer
Processing
Impala
Map/Reduce
Spark
3rd Party
Fraud Architecture
EDH: Model Building, Automated Alerting, Profile Persistence Layer, Forensics, Pattern Detection, Discovery Analytics
Event Processing Alerting, Enrichment, Business Rules
Spark Streaming
Case Management And Alerting
26 © 2014 Cloudera, Inc. All rights reserved. CONFIDENTIAL—DO NOT DISTRIBUTE
Online Systems • Can be incorporated to the authorization pipeline • Rules Engine incorporation • Application of models • Must deliver results sub-second • Must scale to spikes in transaction volume • Historically outside of Hadoop • 0 data loss tolerance and tight SLA requirements
28 © 2014 Cloudera, Inc. All rights reserved. CONFIDENTIAL—DO NOT DISTRIBUTE
Online System Advantages • Enriching the record in Real Time allows us to apply any number of
algorithms • Travel Scoring • Anomaly Detection Models (Clustering) • Commercial ML model application
• All with sub-second latency • Integration into EDH allows easy deployment, monitoring and
integration with offline/batch activities
33 © 2014 Cloudera, Inc. All rights reserved. CONFIDENTIAL—DO NOT DISTRIBUTE
Client Client Incoming Events Operational Cluster (6 months)
Automated & Manual
Analytical Adjustments and Pattern detection
HULC (13 months)
Storage Processing
HDFS
Impala
Map/Reduce
Spark
Security / Transport Service Activator
Kafka Cluster
Topic A
Topic B
Topic C Client Client
Outgoing Events Security / Transport Service Activator
Storage
HDFS
SolR
HBase
Event Processing Interactivity
HBase
Search
Serving Layer
Rules System
Model Building
Speed Layer
Batch Layer
Real-time Cluster
Processing
Impala
Map/Reduce
Spark
3rd Party
Multi-cluster Fraud Architecture
Operational Cluster: Model Building, Automated Alerting, Profile Persistence Layer Discovery Cluster Batch model updates, Discovery Analytics, Pattern Detection
Real Time Cluster Event Processing, Alerting, Enrichment, Business Rules
Business Users
HDFS Replication
Thank you.