self learning networks -...

53

Upload: hoangthien

Post on 09-Apr-2018

215 views

Category:

Documents


1 download

TRANSCRIPT

Self Learning NetworksJeff Apcar, Distinguished Services Engineer

BRKRST-2122

(Acknowledgements to JP Vasseur, PHD, Cisco Fellow and the SLN Team)

• What is a Self Learning Network?

• Security Trends In Advanced Malware

• Anomaly Detection

• SLN Architecture

• Graph Based Visibility

• The SLN User Interface

• Conclusion

Agenda

What is a Self Learning Network?

• SLN is fundamentally a hyper-distributed analytics platform

• Putting together analytics and networking

• The network learns without a human in the loop

• We have developed many sophisticated network functions

• But no real learning incorporated

• True technology disruption & unique competitive landscape ...

What Is Self Learning About?

It started with IoT Network Use CaseHarsh environments: instability of links, limited bandwidth, constrained nodes, stochastic networks (random probability distribution)

Still need for some determinism, tight SLA and hyper-scale networks

And the network needs to be adaptive: every single network is different !

From this SLN was incubatedDeployed IoT network, 800 nodes, April 2013

Some Deployment Scenarios

Predict network behavior and traffic patterns based on multivariable and time-based modeling

Automatically select and optimize network path in real-time, adapt QoS, based on Business SLAs

IWAN Path

Optimization

Detect of multi-layer subtle DoS attacks andAnomaly Detection

Auto learn new threats

Massively Distributed, Global, real-time protection

SLNInternet Behavioural

Analytics for Security

Predictive models for large scale networks, enable:

• High performance

• High Resiliency

• Detection of disruptive subtle DDoS attacks

IoT/IoE

• Fundamentally distributed, building models for visibility and detection at edge

• Mix of Machine Learning (ML) and Threat Intelligence

• Enrichment of context (using Identify Services Engine (ISE) integration)

• Ability to adapt to user feed-back (Reinforcement Learning)

• Advanced control handling networking complexity

SLN Architecture Principles For Security

• Multi-layered defense architectures no longer sufficient to prevent breaches caused by advanced malware ...

• No longer a question of “if” or “when” but “where” ...

• Many of the well-known assumptions are no longer true

• eg. Attacks come from the outside, deterministic, well understood

• Attacks are more and more “subtle” (Hard to detect ...)

• Signature-based architectures vulnerable to mutating attacks (polymorphic)

• Dramatic increase of the number of 0-day attacks

Why Predictive Analytics?

• Why learning ?

• The network is truly adaptive thanks to advanced analytics

• Why a paradigm shift ?

• Move from Trial-and-Error model to a proactive approach using models built using advanced analytics

• The hard part is not just the “analytics” but the underlying architecture for self-learning and the “how to”

What Is a Self Learning Network (SLN)?

Security Trends in Advanced Malware

ReconAttack

Delivery

Redirect

Exploit

Kit

Dropper

File

Comm &

Control

(C2)

Internal

Recon

Data

Exfiltration

Kill Chain Model For Information Security*

*Adapted by Lockheed Martin to model intrusions on a computer network

Research of targets, vulnerabilities, port scanning… Phishing, Spear Phishing, Watering Hole

Safe looking sites containing exploits

After “clicking”, software scans for

0-Day and known vulnerabilities

Malware gets access to target

Uses sophisticated evasive techniques

Determines environment (human/sandbox)

Connection between computer and attacker

Receive commands, download programs

Many C2 techniques can be used, fast flux etc…

Attempt to spread across other

systems and retrieve new credentials

Overt and convert channels to transfer

Data. Complex mechanisms used like

steganography, hiding inside protocols1 2

3

4

56

7

8

• Botnet groups of computers that run malware unknown to owners

• Ranges from thousands to millions of compromised hosts

• Exfiltration is unauthorised information transfer out of target network

• C&C (C2) servers become increasingly evasive

• Fast Flux Service Networks (FFSN), single or double Flux

• DGA-based malware (Domain Generation Algorithms)

• DNS Tunneling, P2P protocols, Anonymised services (Tor)

• Steganography, potentially combined with Cryptography

• Social media updates or email messages

• Mixed protocols , Timing Channels

Botnets & Exfiltration Techniques

C&C Server(s)

Distributed Denial-of-Service Attacks

C&C

Direct Attack

Originates from compromised

hosts or directly from attacker

Send spoofed request to a

server, response to target

Small spoofed request,

produces large reply

Reflection Attack Amplification Attack

Target

Botnet

C&C

Target

Botnet

Attacks can be a anyone or a combination of the above

C&C

Target

NTP

Server

MONLIST?

Attacker Attacker Attacker

Reflectors

• Infected host queries DNS for C&C server FQDN

• Hardcoded or from Domain Generation Algorithm (DGA)

• Authoritative C&C DNS controlled by Botnet master

• DNS reply has very short TTL (a few minutes)

• Botnet members used as C&C relays

• Cycles very quickly through C&C relay hosts due to TTL

• Rapidly-changing, optimised set of C&C endpoints

• Greatly reduces possibility of C&C server takedown

• Still possible to take down the C&C DNS server(s)

Example: Fast Flux DNS - Single Flux

Query: “botnet.tld”

“ns.botnet.tld”

Root DNS

Query: “cc.botnet.tld”

Fixed

C&C DNS

“10.1.1.1”

Query: “cc.botnet.tld”

“10.2.2.2”

TTL Timeout…

C&C Relay

(10.1.1.1)C&C Relay

(10.2.2.2)

Actual

C&CBotnet

Master

Example: Fast Flux DNS – Double Flux

Query: “56W-jes.tld”

“10.3.3.3”

Root DNS

Query: “cc.56W-jes.tld”

Actual C&C

DNS

“10.1.1.1”

“10.2.2.2”

TTL Timeout…

C&C Relay

(10.1.1.1)C&C Relay

(10.2.2.2)

Actual

C&CBotnet

Master

DNS Relay

(10.3.3.3)

DNS Relay

(10.4.4.4)

• Double flux adds changing DNS for C&C domain

• DNS Botnet Master updates low-TTL NS entries

• Through permissive DNS registrar

• Some botnet members are DNS relays, others are C&C relays

• Massively complicates botnet takedown

• DGA introduces further complexity

Generated

Domain

Query: “cc.56W-jes.tld”

Domains

C&C

Relay

DNS

Relay

Malware Botnets Using DGAs

Anomaly Detection

Anomalies are patterns/data points that do not conform to expected behaviour

In algorithmic terms, they significant statistical deviations

Model what is normal behaviour

Normal behaviour changes over time and anomalies evolve (adaptive models required)

Recognising malicious behaviourthat adapts to look “normal”

Differentiate noise from anomalies

SLN allows detection of previously unknown malware complimentary to firewalls & IDS

SLN Anomaly Detection

Challenges

Collective AnomaliesA collection of an instance that is

abnormal in regards to an entire set

Categories Of Anomalies

Point AnomalyA specific instance is abnormal compared to other instances

Contextual AnomalyA data instance looks normal individually, but abnormal when taken in a specific context

SLN Architecture

Visualisation

SLN Centralised

Agent (SCA)

Da

tace

nte

r a

nd

clo

ud

Priva

te/P

ub

lic

Ne

two

rk

Ne

two

rk E

dg

e

(LA

N, W

PA

N)

Controller

infrastructureSLN Architecture

SCA

Public/Private

Internet

DLADLA

DLA Distributed

Learning Agent

• Granular data collection with knowledge extraction from

various sources (DPI, Routing…) to build a model

• Analytics and learning

• Edge Control Architecture (ECA): autonomous embedded

control, fast close loop, advanced networking control (police,

shaper, recoloring, redirect, ....)

• Application hosted devices (SDN)

• Orchestration and interaction with

remote learning agents (DLA)

• Advanced Visualisation

• Centralised policy

• Interaction with other security

components (ISE, Threat Intelligence)

• DLA has been designed for low footprint both in terms of memory and CPU

• Feature computation, ID & classification are performed locally

• Lightweight techniques emplyed with no significant impact on the edge device

The DLA Can Have Many Data Sources

DLA

NBAR

Netflow

DPI

Data &

Control

Planes

Local

State

Adaptive Firewall w/ AMP

• Component to SLN

• Enhanced context, ML+Threat Intelligence

• Edge Control

SCA Context Enrichment

SCAFire

Power

DLA

Identity Services Engine

ISE

Advanced Malware ProtectionThreat

Grid

Advanced Malware ProtectionEdge

AMP

DNS/IP BlacklistsTalos

Feed

Feed for SLN

Edge Control

Reprogramming the network fabric (install new

rules…) + close loop feed-back.

Username, domain, location, time

Edge Control: shape, police, drop,

redirect, ... Reroute, VLAN, ...

Edge Learning: models normal

traffic, graph-based anomalies.

Trigger for traffic mitigation

Controller

infrastructure

On-Premise Edge Control

SCA

Public/Private

Internet

DLADLA

DLA

Control Policy

Smart Traffic flagging

According to {Severity, Confidence,

Anomaly_Score}

Traffic segregation & selection

Network-centric control (shaping, policing,

divert/redirect)

Honeypot

(Forensic Analysis)

DSCP ReWrite

CBWFQDSCP ReWrite

CBWFQ

Shaping

DLAInternals

SLN Graph-based Visibility

• Graph structure usage in SLN:• Infer information on the roles of hosts

• Detect unusual change in graph structure

• Capturing and modeling the dynamic of

the flows in the network

What Is Graph-Based Visibility?Vertices (Nodes, hosts, sites, disks, router, laptop…)

Edge (Comms links, flows…)

SLN Usage Examples

Vertices Clusters Group of devices

Physical: Country, City, Building, DC, Floor,

Logical: DC Function, Organisational Group

Edges Traffic Flows Applications, protocols, bandwidth, seasonality

• Lack of visibility of flows between remote sites (clusters) is an issue

• Difficult to detect Malware/data exfiltration involving lateral movement

Visualising Anomaly Detection Process

Cluster

A

Cluster

B

Application Flows

Type Frequency Behaviour

SLN Process

Netflow,

DPI, Local

state

Constuct

statistical

model of

normal traffic

Deviations to

the model

Establish likely

root cause of

deviation

Extract

measurable

characteristics

Red Graph: Anomaly

Green Graph: Shell

Blue Graph: VoIP

Grey Graph: FTP/SCP

Visualising Likely/Unlikely Flows

Sydney

Chicago

Raleigh

Data

Centre

San

Diego Dallas

Data

Centre

New

York

Beijing

Belgium

• Static Versus Dynamic cluster computation

• ML algorithms are used to computed inter-

cluster relationship

• Colored graphs

• Simple property of likelihood

DLA

Green Graph: Shell

Blue Graph: VoIP

Grey Graph: FTP/SCP

Visualising Seasonality

Sydney

Chicago

Raleigh

Data

Centre

San

Diego Dallas

Data

Centre

New

York

Beijing

Belgium

AM PM AM PM AM PM AM PM AM PM AM PM AM PM

22 23 24 25 26 27 28

AnomalyAnomaly

• ML computes models of seasonality per application

DLA

Grey Graph: FTP/SCP

Visualising Behavioural Analytics

Sydney

Chicago

Raleigh

Data

Centre

San

Diego Dallas

Data

Centre

New

York

A vector of properties (models of behavioural analytics)

DLA

Host

Application

What Are Features and Vectors?Features Measureable characteristics of network state

Vectors Highly dimensional spaces made up of features

Sample of Features for

Dest & Source

Number of flows

Unique IP addresses

Unique ports

Entropy of ports

Entropy of DNS names

Proportion of HTTP ports

Proportion of DNS ports

Number of bytes

Number of DNS requests

Number of DNS replies

For hosts or

applications

Feature ConstructorFeature Vector

(Many dimensions)

Application Categoriesnetwork

office

other

p2p

print

proxy

remote

shell

social

tls

tunneling

voip

windows

www

auth

cloud-apps

cloud-storage

dns-tunneling

darknet

dns

database

endpoint

file-xfer

game

icmp

im

malware

media

ssh

exec

login

shell

Secure-telnet

telnet

gridftp

winmx

sftp

uucp-path

bftp

ftp-data

nfs

ftp

ftp-agent

sift-uft

tftp

gopher

Host Anomalies Using Feature Vectors

Feature Value

HTTP Flows MANY

Neighbours MANY

Feature Value

HTTP Bytes FEW

Neighbours FEWFeature Value

HTTP Flows MANY

Neighbours FEW

Hosts whose behaviour do not match known models

(cannot be explained) may point to an anomaly

Use Case Scenarios

Sydney

Chicago

Raleigh

Data

Centre

San

Diego Dallas

Data

Centre

Beijing

Detection of unlikely flow (e.g., ftp) between two specific clusters

Identification of the username, location, device via ISE

Identification of known threats (AS, IP@, Domains) !!

Edge Mitigation

Scenario 1 – Unlikely flows

Detection of a change in DNS usage (e.g., volume, content of

DNS requests) not compliant with the learned model

Or fake VoIP, stealing bit headers, ...

Detection of seasonality changes ...

Identification of known threats (AS, IP@, Domains) !!

Edge Mitigation

Scenario 2 – Change in DNS usage

DLA

Reinforcement learning

SLN Targeted Outcomes For The User

Normal behaviours Anomalous Behaviours

Who talks to whom? Detection of new applications where never used

Active applications between clusters Detection of abnormal behaviours (data

exfiltration)

Applications displaying seasonal behaviours Adaption: Is abnormal event of interest?

Additional application characterisation Upon detecting anomalies; Explain why? What

has changed?

Contextual data; usernames, domains… Ability to perform advanced control

(Shape, route, redirect…)

SLN adapting to user expectations

The SLN User Interface

Severity Filter

Current Anomalies Window

DLA San Jose, Brussels, Zurich, Rome

Several anomalous applications with high severity

Unexpected Edge (Flow)

Affected Internal Cluster

Suspicious External Cluster in Hungary

3 flows, DNS, NFS and SUNRPC

Spiritual health website, but no HTTP flow?

Unexpected Edge (Flow)

Cluster A (Vertice)

Cluster B (Vertice)

Anomalous Flow (Edge)

Unexpected Edge (Flow)

# DNS transactions does not match predictions

Seasonality

Port scan between branch and corporate host

Two instances of targeted port scan

(Portmapper)

(Printing)

(HTTPS)

(HTTP)

(PPTP)

(DCE/RPC)

(DNS)

(FTP)

(Kerberos)

(RDP)

(Telnet)

(RFB)

(SSH)

1

2

Many characteristics of a scan can trigger detection:

• Variety of ports and/or destinations (network sweep)

• Short flows (e.g. SYN, SYN/ACK, RST)

• Incomplete L7 handshake (no application PDUs exchanged)

• Unusual applications/protocols spoken by scanning host

• Unusually high rate of new flows

SLN does not look for network scans

specifically, but detects them because they are

different from a host’s usual behavior

Port Scan

Rapid TCP DNS Requests

Same host performing TCP DNS request in

parallel with normal UDP request

Normal UDP DNS requests from host

Behaviour similar to DGA based malware

evading detection via DNS over TCP

DPI detects entropy of names etc…

Repeated SSH Login Atempts

Large number of new flows anomaly

Many SSH attempts (Port 22), short flows (not getting past login)

Could indicate brute force attack

Anomalous Connection on port 2201 detected as SSH

SLN leverages NBAR to recognise protocols on non-

standard ports

Unusual # of SNMP queries

Unusual number of packets per flow

Normal SNMP, short in duration

1364 SNMP packets indicates change of

behaviour on host (snmp walk?)

Summary

• SLN is a disruptive approach for malware detection using behavioral analytics, relying on dynamic learning, fully auto-adaptive

• Network data is analysed locally by SLN using advanced and lightweight analytics

• The router can perform local mitigation

• Lightweight and distributed architecture that is scalable

• Visualisation is key with simple understandable UI

Summary

Q & A

Complete Your Online Session Evaluation

Don’t forget: Cisco Live sessions will be available for viewing on-demand after the event at CiscoLiveAPAC.com

• Give us your feedback and receive a Cisco 2016 T-Shirt by completing the Overall Event Survey and 5 Session Evaluations.– Directly from your mobile device on the Cisco

Live Mobile App

– By visiting the Cisco Live Mobile Site (URL)

– Visit any Cisco Live Internet Station located throughout the venue

– T-Shirts can be collected in the World of Solutions on Friday from 12:00pm - 2:00pm

Thank you