kevin o'dell - fraud and event detection using the enterprise data hub

13
Fraud Detection Using Cloudera EDH Fraud Detection Kevin O’Dell/Field Engineer

Upload: huguk

Post on 15-Jul-2015

153 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Kevin O'Dell - Fraud and event detection using the Enterprise Data Hub

Fraud Detection Using Cloudera EDH Fraud Detection

Kevin O’Dell/Field Engineer

Page 2: Kevin O'Dell - Fraud and event detection using the Enterprise Data Hub

3 © 2014 Cloudera, Inc. All rights reserved. CONFIDENTIAL—DO NOT DISTRIBUTE

Agenda •  Overview of Problem •  Offline Fraud •  Online Fraud •  Discussion

Page 3: Kevin O'Dell - Fraud and event detection using the Enterprise Data Hub

6 © 2014 Cloudera, Inc. All rights reserved. CONFIDENTIAL—DO NOT DISTRIBUTE

Problem Statement •  About $15B annually – Card only •  1.3B Credit, Debit and Pre-paid cards – 5 per adult. •  Omni-channel – Debit, Credit, Online, PoS, Deposits •  Increasing Regulatory / Compliance Requirements •  How do we integrate multiple systems and sources

Page 4: Kevin O'Dell - Fraud and event detection using the Enterprise Data Hub

13 © 2014 Cloudera and/or its affiliates. All rights reserved.

Fraud Systems

Page 5: Kevin O'Dell - Fraud and event detection using the Enterprise Data Hub

14 © 2014 Cloudera, Inc. All rights reserved. CONFIDENTIAL—DO NOT DISTRIBUTE

Key Requirements •  No data loss is acceptable •  Stream processing must complete ASAP, <500ms •  Support approximately 400M transactions per day in aggregate

•  Highest Volume Flow: •  Current – 1.8k transactions/s •  Projected – 10k transactions/s

•  Each flow has at least three steps •  Adapter, Persistence, Hadoop Persistence •  Most complex with approximately seven steps

•  Avoid massive code refactoring

Page 6: Kevin O'Dell - Fraud and event detection using the Enterprise Data Hub

15 © 2014 Cloudera, Inc. All rights reserved. CONFIDENTIAL—DO NOT DISTRIBUTE

Fraud System Categories • Online

•  Ingest •  Enrichment (Profiles, feature selection, etc.) •  Early warning / detection (model serving / model application) •  Persistence

• Offline (Human activities) •  Model building / discovery •  Case management •  Forensics

Page 7: Kevin O'Dell - Fraud and event detection using the Enterprise Data Hub

16 © 2014 Cloudera, Inc. All rights reserved. CONFIDENTIAL—DO NOT DISTRIBUTE

Some Numbers •  Approximately 400M events (not necessarily transactions) per day •  Analysts get data within 5 minutes (Approx 90M/day) •  Over 100 Source Systems •  Offline system rolls files every 5 minutes •  Online system processes transaction authorization flows in < 500ms.

Page 8: Kevin O'Dell - Fraud and event detection using the Enterprise Data Hub

17 © 2014 Cloudera and/or its affiliates. All rights reserved.

Architecture

Page 9: Kevin O'Dell - Fraud and event detection using the Enterprise Data Hub

19 © 2014 Cloudera, Inc. All rights reserved. CONFIDENTIAL—DO NOT DISTRIBUTE

Client Client Incoming Events Cloudera EDH

Automated & Manual

Analytical Adjustments and Pattern detection

Kafka Cluster

Topic A

Topic B

Topic C Client Client

Outgoing Events

Storage

HDFS

SolR

HBase

Event Processing Interactivity

HBase

Search

Serving Layer

Rules Engine

Model Building

Speed Layer

Batch Layer

Processing

Impala

Map/Reduce

Spark

3rd Party

Fraud Architecture

EDH: Model Building, Automated Alerting, Profile Persistence Layer, Forensics, Pattern Detection, Discovery Analytics

Event Processing Alerting, Enrichment, Business Rules

Spark Streaming

Case Management And Alerting

Page 10: Kevin O'Dell - Fraud and event detection using the Enterprise Data Hub

26 © 2014 Cloudera, Inc. All rights reserved. CONFIDENTIAL—DO NOT DISTRIBUTE

Online Systems •  Can be incorporated to the authorization pipeline •  Rules Engine incorporation •  Application of models •  Must deliver results sub-second •  Must scale to spikes in transaction volume •  Historically outside of Hadoop •  0 data loss tolerance and tight SLA requirements

Page 11: Kevin O'Dell - Fraud and event detection using the Enterprise Data Hub

28 © 2014 Cloudera, Inc. All rights reserved. CONFIDENTIAL—DO NOT DISTRIBUTE

Online System Advantages •  Enriching the record in Real Time allows us to apply any number of

algorithms •  Travel Scoring •  Anomaly Detection Models (Clustering) •  Commercial ML model application

•  All with sub-second latency •  Integration into EDH allows easy deployment, monitoring and

integration with offline/batch activities

Page 12: Kevin O'Dell - Fraud and event detection using the Enterprise Data Hub

33 © 2014 Cloudera, Inc. All rights reserved. CONFIDENTIAL—DO NOT DISTRIBUTE

Client Client Incoming Events Operational Cluster (6 months)

Automated & Manual

Analytical Adjustments and Pattern detection

HULC (13 months)

Storage Processing

HDFS

Impala

Map/Reduce

Spark

Security / Transport Service Activator

Kafka Cluster

Topic A

Topic B

Topic C Client Client

Outgoing Events Security / Transport Service Activator

Storage

HDFS

SolR

HBase

Event Processing Interactivity

HBase

Search

Serving Layer

Rules System

Model Building

Speed Layer

Batch Layer

Real-time Cluster

Processing

Impala

Map/Reduce

Spark

3rd Party

Multi-cluster Fraud Architecture

Operational Cluster: Model Building, Automated Alerting, Profile Persistence Layer Discovery Cluster Batch model updates, Discovery Analytics, Pattern Detection

Real Time Cluster Event Processing, Alerting, Enrichment, Business Rules

Business Users

HDFS Replication

Page 13: Kevin O'Dell - Fraud and event detection using the Enterprise Data Hub

Thank you.