pennsylvania state university john yen chen zhong peng liu
DESCRIPTION
Revealing the Characteristics of Cyber Analysts’ Reasoning Processes: A Trace Analysis Approach Annual Review ARO MURI on Computer-aided Human-centric Cyber SA October 29, 2013. Pennsylvania State University John Yen Chen Zhong Peng Liu. Army Research Laboratory Robert Erbacher - PowerPoint PPT PresentationTRANSCRIPT
Revealing the Characteristics of Cyber Analysts’ Reasoning Processes:
A Trace Analysis Approach
Annual ReviewARO MURI on Computer-aided Human-centric Cyber SA
October 29, 2013
Pennsylvania State UniversityJohn Yen
Chen ZhongPeng Liu
Army Research LaboratoryRobert ErbacherSteve Hutchinson
Renee EtotyHasan Cam
William Glodek
Objectives:• Understand the analytical reasoning process of
cyber analysts• Capture the analytical reasoning trace of cyber
analyst through non-invasive tool• Develop a model of analytical reasoning process
that can capture rich trace and enable automated trace analysis
• Conduct experiments involving cyber analysts
Scientific/Technical Approach• Developed Observation-Hypothesis-ActionHypothesis
(OHA) model of analytical reasoning process• Developed and implemented Analytical Reasoning Support
Tool for Cyber Analysis (ARSCA)• Designed experiments that capture realistic challenges in
cyber SA using VAST 2012.• Collaborated with an ARL study about visualization of
cyber SA led by Dr. Erbacher.• Conducted multiple pilot studies (at Penn State and Army
Research Lab) to polish ARSCA
Accomplishments• Conducted experiments, in collaboration with Army Research
Lab, involving subjects from Penn State and ARL.• Initial case study about trace analysis provided new insights
about the reasoning process of analysts• Initial correlation analysis suggest relationship between
characteristics of traces and performance/expertise
Opportunities• Improve performance of analysts through OHA-based training• Investigate the difference strategies between experts and novice• Investigate using aggregated analyst experiences to support
analytical reasoning process.
Computer-Aided Human Centric CyberSituation Awareness
J. Yen, C. Zhong, P. Liu, R. Erbacher, S. Hutchinson, R. Etoty, H. Cam, W. Glodek
System Analysts
Computer network
SoftwareSensors, probes• Hyper Sentry• Cruiser
Mu
lti-
Sen
sory
Hu
man
C
om
pu
ter
Inte
ract
ion
• Enterprise Model• Activity Logs • IDS reports
• Vulnerabilities
Cognitive Models & Decision Aids• Instance Based Learning Models
• Simulation• Measures of SA & Shared SA
• • •
Da
ta C
on
dit
ion
ing
As
so
cia
tio
n &
Co
rre
lati
on
Automated Reasoning Tools• R-CAST• Plan-based
narratives• Graphical
models• Uncertainty
analysis
Information Aggregation
& Fusion• Transaction Graph methods
•Damage assessment
Computer network
• •
•
Real World
Test-bed
3
4
Year 4 Accomplishments at a GlancePublications: 1. Zhong, C., Kirubakaran, D.S., Yen, J., Liu, P.,
Hutchinson, S., & Cam, H., “How to Use Experience in Cyber Analysis: An Analytical Reasoning Support System”, in Proceedings of IEEE Conference on Intelligence and Security Informatics (ISI), 2013.
2. Chen, P.C., Liu, P., Yen, J., & Mullen, T., “Experience-based cyber situation recognition using relaxable logic patterns”, in IEEE International Multi-Disciplinary Conference on Cognitive Methods in Situation Awareness and Decision Support (CogSIMA), pp. 243-250, 2012.
3. Chen Zhong, VAST 2013 Workshop Presenter4. Working papers for CogSIMA 2014
Tools: • ARSCA
Technology transfer:
• J. Yen as summer faculty fellow at ARL • Deep collaborations with ARL researchers: • Brought the ARSCA toolkit to
Adelphi site • 12 ARL security analysts
participated• Weekly teleconferences• Joint work on a series of
papers •Invention Disclosure to PSU
Awards: • Best Paper Award, CogSIMA 2012.• Chen Zhong: Grace Hopper Celebration of Women in
Computing Scholarship. • Chen Zhong, Honorable Mention, VAST Challenge
2013, Mini-Challenge 3 (Visual Analytic for Cyber SA)
Students: • Chen Zhong, PhD
Cyber SA Depends on Human Analysts
Network
Attacks
Data Sources(feeds)
DepictedSituation
GroundTruth (estimates)
Compare
JobPerformance
5
“Hi Bob, how did you nail it?”
Answer A: “you know this is my job”
Answer B: “this tool is awesome”
Answer C: “I talked to Jacob”
Answer D: “I employ good reasoning” [our research focus]
6
High level research questions
Q1. How do analysts reason?
Q2. Does good reasoning matter?
Q3. If it matters, how to enable analysts to do more good reasoning, and less bad reasoning? [training?]
Q4. How to automate, to which extent? – Understanding the analyst’s reasoning processes is
essential to bridge the gaps between human and tools. – The analyst’s reasoning processes provide insights on
how to automate…
7
Prerequisites
P1. Need to get the reasoning processes of analysts
P2. Need to characterize these reasoning processes
P3. Need to correlate the characteristics with job performance
• In AI, this is related to “knowledge acquisition/solicitation”
• In Cognitive Science, this is denoted “theories on how we reason”, e.g., the mental model theory, the procedural memory concept in ACT-R
8
Existing Knowledge Acquisition Approaches
• CTA: cognitive task analysis
• Simulation: ACT-R needs “procedural memory”
• Knowledge engineering in expert systems
• Case-based learning
9
The Knowledge Acquisition Bottleneck (Feigenbaum)
Our ApproachInsight 1: Diverse reasoning processes may share common structures and critical elements
– We propose: OHA model
Insight 2: These critical elements and the relationships among them later on could be used to recover the reasoning processes
Insight 3: Using a software tool to track the traces of analysts’ reasoning processes
– We built ARSCA (Analytical Reasoning Support Tool for Cyber Analysis) toolkit
10
Three Merits
1. Don’t need the analyst to remember what he/she did; can automatically restore his/her reasoning processes from traces.
2. Directly correlated the traces with job performance
3. Provided abundant details; thoughts expressed in natural language
11
Challenges
C1. Validation challenge: Are the restored reasoning processes really the original?
C2. How to trace in a non-intruding, non-distracting manner?
C3. Tradeoff challenge:
12
tradeoffsAutomation intrace analysis:How structured?
How much infocan be collected?
Task Design
Data from VAST 2012 Challenge
– Data sources• Corporate network configuration• Firewall logs 26, 000, 000 entries. • IDS alerts 35,000 entries.
– Ground truth• An attack over two days (40 hours)
Task Design (2)
Task Time period Raw Data Size1 4/5 20:18-20:30
(12min)IDS: 214Firewall: 123,133
2 4/5 22:15-22:26(11min)
IDS: 239Firewall: 115,524
3 4/6 0:00-0:10(10 min)
IDS: 296Firewall: 112, 766
4 4/6 18:01-18:15(14 min)
IDS: 252Firewall: 85,463
Tracing Tool Architecture
15
DBMS Engine
Queries
Answers
IDS alerts
Firewall logs
Others
View
View
View
A tree of thoughts
Mouse Keyboard
Invisible tracking
- Keystrokes- Data filtering conditions- Observations
XML traces
How to work with the Tool?
• Demo1: working with the tool.
• Demo2: traces are captured in XML files.
16
Let’s Look into the Traces
• One trace (“pilot 1”)– A quick replay of the analytical reasoning
process– A quick look of the trace– Look into the Hypotheses– Look into the Actions and Observations
(which forms the “context” of the hypotheses)
• Compare 10 Traces• Initial Correlation
17
Tour: First Step
• One trace (“pilot 1”)– A quick replay of the analytical reasoning
process– A quick look of the trace– Look into the Hypotheses– Look into the Actions and Observations
(which forms the “context” of the hypotheses)
• Compare 10 Traces• Initial Correlation
18
A Quick Replay of the Analytical Reasoning Process
Video
19
Tour: Step 2
• One trace (“pilot 1”)– A quick replay of the analytical reasoning
process– A quick look of the trace– Look into the Hypotheses– Look into the Actions and Observations
(which forms the “context” of the hypotheses)
• Compare 10 Traces• Initial Correlation
20
A Quick Look of the Trace
Duration: 36 min # of Nodes: 30
Trace Operations
E-Tree:Width: 8
Depth: 3
# of Operations: 92
…21
Tour: Step 3
• One trace (“pilot 1”)– A quick replay of the analytical reasoning
process– A quick look of the trace– Look into the Hypotheses– Look into the Actions and Observations
(which forms the “context” of the hypotheses)
• Compare 10 Traces• Initial Correlation
22
H-TreeContinuous
occurrence of alerts showing an outside
ip connecting to various inner ips
Looking into IDS alerts
The outside ip is the malicious C&C
server
Looking into Firewall Log, check network
flow from this suspicious ip
All the destination ports are different.
This outside ip may do a port scan.
Observation 1Action 1
Hypothesis 1
Action 2 Observation 2
Hypothesis 2
EU (Experience Unit)
EU (Experience Unit)
H1
H2
…
E-Tree
H-Tree
23
Operations on Hypotheses
H1
H2
…
H-Tree
H_New: Create a hypothesis
H_Sbling: Add a sibling/alternative hypothesis
H_Jump: Change the current focus from one hypothesis to another.
H_Edit_Content: Edit the content of a hypothesis
H_Edit_Truth: Edit the truth value of a hypothesis
24
Look into the Hypotheses of Pilot1 (Cont’d)
• # of the hypothesis: 21
• Operations on the hypotheses (next slide)
H_New Create a hypothesis
H_Add_Sibling Add a sibling/alternative hypothesis
H_Jump Change the current focus from one hypothesis to another
H_Edit_Content Edit the content of a hypothesis
H_Edit_Truth Change the truth value(true of false) of the a hypothesis
25
Look into the Hypotheses of Pilot1
H_New H_Add_Sibling H_Jump H_Edit_Content H_Edit_Truth0
2
4
6
8
10
12
14
16
Trace Operations on Hypotheses
26
Tour: Step 4
• One trace (“pilot 1”)– A quick replay of the analytical reasoning
process– A quick look of the trace– Look into the Hypotheses– Look into the Actions and Observations
(which forms the “context” of the hypotheses)
• Compare 10 Traces• Initial Correlation
27
The Context of a Hypothesis
Continuous occurrence of alerts showing an outside
ip connecting to various inner ips
Looking into IDS alerts
The outside ip is the malicious C&C
server
Looking into Firewall Log, check network
flow from this suspicious ip
All the destination ports are different.
This outside ip may do a port scan.
Observation 1Action 1
Hypothesis 1
Action 2 Observation 2
Hypothesis 2
Context1
The Context of a Hypothesis
Continuous occurrence of alerts showing an outside
ip connecting to various inner ips
Looking into IDS alerts
The outside ip is the malicious C&C
server
Looking into Firewall Log, check network
flow from this suspicious ip
All the destination ports are different.
This outside ip may do a port scan.
Observation 1Action 1
Hypothesis 1
Action 2 Observation 2
Hypothesis 2
Context2
Actions and Observations in E-Tree (Cont’d)
Continuous occurrence of alerts showing an outside
ip connecting to various inner ips
Looking into IDS alerts
The outside ip is the malicious C&C
server
Looking into Firewall Log, check network
flow from this suspicious ip
All the destination ports are different.
This outside ip may do a port scan.
Observation 1Action 1
Hypothesis 1
Action 2 Observation 2
Hypothesis 2
30
Actions and Observations
Checking IDS Alerts Finding … in IDS Alerts
Action Observation
Checking Network Topology Finding … in Network Topology
Checking Firewall logs Finding … in Firewall logs
… …
31
Operations on Actions and Observations
AO_Lookup_Port_Term
Look up an explanation of a port or a term
AO_Link Link a set of items in observation together for a reason (e.g. same port)
AO_Filter Filter the data by creating a filtering condition
AO_QuickFind Quick Find a term in the data
AO_Selecting Select some data entries (i.e. create action items)
AO_Finding Find something in the selected data entries (i.e. create observation items)
32
Look into the E-Tree in Pilot1’s E-Tree:# of Nodes
EU_Num H_Num Total_Num9 21 30
Number of Nodes in Pilot1's E-Tree
EU_NumH_Num
33
Look into the E-Tree in Pilot1’s Trace:# of Operations
AO_Sele
cting
AO_Finding
AO_Quick
Find
AO_Filte
r
AO_Link
AO_Looku
p_Pprt_
Term
0
5
10
15
20
25
# of Operations in Pilot1's Trace
34
Look into the E-Tree of Pilot1 (Cont’d)
pilot1 pilot2 pilot4 101 128 174 193 239 246 2850
1
2
3
4
5
6
7
8
9
10
WidthDepth
35
Summary: A Slower Replay
Video
36
Two Cases of Jumping Back • Go back to previous node
– Case 1JUMP_FROM_TO (H39431008 H46131157)ADD_SIBLING (H46131157 H66431551 )
– Case 2JUMP_FROM_TO (H89931527 H58331044)CHANGE_TRUTH_VALUE (H58331044 False,Unknown)
1
2 3
1
2
Tour: Step 5
• One trace (“pilot 1”)– A quick replay of the analytical reasoning
process– A quick look of the trace– Look into the Hypotheses– Look into the Actions and Observations
(which forms the “context” of the hypotheses)
• Compare 10 Traces• Initial Correlation
38
Trace Comparison: # of Nodes in E-Tree
More alternative hypotheses
Trace Comparison: E-Trees
pilot1 pilot2 pilot4 101 128 174 193 239 246 2850
5
10
15
20
25
30
35
Total_NumWidthDepth
40
Trace Comparison:Trace Operations
41
Tour: Step 6
• One trace (“pilot 1”)– A quick replay of the analytical reasoning
process– A quick look of the trace– Look into the Hypotheses– Look into the Actions and Observations
(which forms the “context” of the hypotheses)
• Compare 10 Traces• Initial Correlation
42
Initial Correlation
• Correlated performance with E-Tree features:
43
pilot1 pilot2 pilot4 101 128 174 193 239 246 2850
5
10
15
20
25
30
35
Expertise, Performance, E-Tree Features
Expertise (Pre-Questionnaire) Performance ScoreTotal_NumWidthDepth
FY 2014 Plan
44
• Continue to conduct, in collaboration with ARL researchers, Analytical Reasoning Experiment (VAST 2012)
• Analyze the traces of analytical reasoning• Is the first thought important for an analyst’s performance?• How will the key observation influence the analytical reasoning process?• What are the differences between strategies used by experts and novice?
• Design and conduct, in collaboration with ARL researchers, a collaborative analytical reasoning experiment
• Enables digging into flow data• Two-analysts teams• Leverages VAST 2013
• Enhance the context-guided experience-based analytical reasoning support• Aggregating multiple experiences of analysts• Support context-guided experience-based simulation
46
Q & A
Thank you.