impact analysis - impactscale: quantifying change impact to predict faults in large software systems
DESCRIPTION
Paper: ImpactScale: Quantifying Change Impact to Predict Faults in Large Software SystemsAuthors: Kenichi Kobayashi, Akihiko Matsuo, Katsuro Inoue, Yasuhiro Hayase, Manabu Kamimura and Toshiaki Yoshino Session: Research Track 2: Impact AnalysisTRANSCRIPT
ImpactScale: Quantifying Change Impact to Predict Faults in Large Software Systems
Kenichi KobayashiFujitsu Laboratories
Akihiko Matsuo Fujitsu Laboratories
Manabu KamimuraFujitsu Laboratories
Toshiaki YoshinoFujitsu
Yasuhiro HayaseUniversity of Tsukuba
Katsuro InoueOsaka University
ICSM2011 @ Williamsburg, 2011-09-27
Overview
1. Background and Goal2. Definition of ImpactScale3. Measuring ImpactScale in Real Systems4. Fault Prediction and Evaluation5. Summary
Copyright 2011 FUJITSU LABORATORIES LIMITED1
ICSM2011 @ Williamsburg, 2011-09-27
Practitioners’ Point of View
Copyright 2011 FUJITSU LABORATORIES LIMITED
Background Fault prediction in maintenance is a difficult task, and
predictive performance is not enough only with product metrics. Product Metrics are metrics extracted from software product such as
source code.
Therefore, process metrics, such as code churn and logical coupling, have been combined to product metrics. Process Metrics are metrics extracted from software process such as
change histories.
However, in enterprise scenes of maintenance, documents, change histories, bug reports, and specialists’ knowledge are often lost, out-of-date, or unable to be used.
2
ICSM2011 @ Williamsburg, 2011-09-27 Copyright 2011 FUJITSU LABORATORIES LIMITED
Goals
Problem Process metrics cannot be always obtained.
Motivation To achieve high predictive performance only with product
metrics extractable from source code
Goals To define a new product metric To show the effectiveness of the metric
3
ICSM2011 @ Williamsburg, 2011-09-27
Need not to solve the affected areas.Only need to solve the scale of them.
We assumed Change Impact Analysis enables us to extract implicit dependency.
WeaknessHigh computational cost
Copyright 2011 FUJITSU LABORATORIES LIMITED
Basic IdeaSoftware dependency is one of surviving factors of faults even after release.
修
修
修
修正忘れ
暗黙の依存関係Change Impact AnalysisTechnique to solve the affected areas when some part of software is changed.
ImpactScale(abbrev. IS)
fix
fix
fix
implicit dependency
missed fix
A metric that quantifies the scale of change impact can improve the performance of fault prediction.
Hypothesis
4
ICSM2011 @ Williamsburg, 2011-09-27
Overview
1. Background and Goal2. Definition of ImpactScale3. Measuring ImpactScale in Real Systems4. Fault Prediction and Evaluation5. Summary
Copyright 2011 FUJITSU LABORATORIES LIMITED5
ICSM2011 @ Williamsburg, 2011-09-27
Dependency
Dependencies are extract from target software, and Propagation Graph is built.
Propagation Model Probabilistic propagation Relation-sensitive propagation
ImpactScale is sum of all Quantities of Change Impact.
Copyright 2011 FUJITSU LABORATORIES LIMITED
Overview of ImpactScale DefinitionPropagation Graph
Code Node
Data Node
Quantity of Change Impact
from C to A
Change!Change!
6
ICSM2011 @ Williamsburg, 2011-09-27 Copyright 2011 FUJITSU LABORATORIES LIMITED
Propagation Graph① Build dependency graph
extracted from target software
《Dependency Graph》 Code Nodemodule, class,
function, source code
Data NodeDB table,
global variable
Dependency Edgewith relation typeCALL, READ, WRITE
7-1
ICSM2011 @ Williamsburg, 2011-09-27 Copyright 2011 FUJITSU LABORATORIES LIMITED
Propagation Graph① Build dependency graph
extracted from target software
《Dependency Graph》《Propagation Graph》
② Add reverse edges to build Propagation Graph
Change impact analysis for ImpactScale is performed on Propagation Graph.
Code Nodemodule, class,
function, source code
Data NodeDB table,
global variable
Dependency Edgewith relation typeCALL, READ, WRITE
7-2
ICSM2011 @ Williamsburg, 2011-09-27 Copyright 2011 FUJITSU LABORATORIES LIMITED
Probabilistic PropagationWe assume that change impact probabilistically propagates
from a node to another node as some Ripple Effect studies. [Hanny72] [Tsantalis05] [Sharafat07]
In this presentation, propagation probability is always 0.5.
×0.5
×0.5×0.5
Quantity of change impact
from the source node
Propagation Probability
Propagation Probability
Change!Change!
8
ICSM2011 @ Williamsburg, 2011-09-27
To avoid overestimation, we used context information to eliminate unlikely propagation.We use an edge’s relation type as minimal context information in
point of computational time.
Cut Rules determine whether propagation from one node to its next node is cut or not, referring its previous and next edge’s relation type.
We call such controlled propagation relation-sensitive propagation.
Computational complexity is practically low.Copyright 2011 FUJITSU LABORATORIES LIMITED
Relation-sensitive Propagation
currentnode
nextrelation type
previous relation type next
nodeCut Rulerefer
refer
9
ICSM2011 @ Williamsburg, 2011-09-27
Cut Rule 2During finding callers,
don’t find callees.
Example of Cut Rules
Copyright 2011 FUJITSU LABORATORIES LIMITED
Change!Change! Change!Change!
Cut Rule 1During finding callees,
don’t find callers.
Cut Rule 3Don’t find beyond
READ edges.
Example from “C” Example from “F”
10
ICSM2011 @ Williamsburg, 2011-09-27
Overview
1. Background and Goal2. Definition of ImpactScale3. Measuring ImpactScale in Real Systems4. Fault Prediction and Evaluation5. Summary
Copyright 2011 FUJITSU LABORATORIES LIMITED11
ICSM2011 @ Williamsburg, 2011-09-27 Copyright 2011 FUJITSU LABORATORIES LIMITED
Two enterprise accounting systems in different companies
Data Sets for Evaluations
Data Set Name #Modules Total LOC #Faults #Faulty
Modules
Faulty Module
Rate
Term Fault-
Collected
DS1 5.8k 1.6M 269 215 3.7% 40 months
DS2 7.6k 3.7M 250 208 2.7% 40 months
Common Properties Language: COBOL Age: Over 20 years
Collected Metrics 7 Existing Metrics
LOC, WMC, MaxVG, Sections, Calls, Fan-in, Fan-out
ImpactScale
12
ICSM2011 @ Williamsburg, 2011-09-27
Real Example of Calculating ImpactScale
Copyright 2011 FUJITSU LABORATORIES LIMITED
DS1
#modules5.8k
Each square-shaped group of modules is a sub-system.
13-1
ICSM2011 @ Williamsburg, 2011-09-27
Real Example of Calculating ImpactScale
Copyright 2011 FUJITSU LABORATORIES LIMITED
DS1
#modules5.8k
Each square-shaped group of modules is a sub-system.
13-2
ICSM2011 @ Williamsburg, 2011-09-27
Real Example of Calculating ImpactScale
Copyright 2011 FUJITSU LABORATORIES LIMITED
DS1
#modules5.8k
Each square-shaped group of modules is a sub-system.
13-3
ICSM2011 @ Williamsburg, 2011-09-27
Real Example of Calculating ImpactScale
Copyright 2011 FUJITSU LABORATORIES LIMITED
DS1
#modules5.8k
Each square-shaped group of modules is a sub-system.
13-4
ICSM2011 @ Williamsburg, 2011-09-27
Real Example of Calculating ImpactScale
Copyright 2011 FUJITSU LABORATORIES LIMITED
DS1
#modules5.8k
Each square-shaped group of modules is a sub-system.
13-5
ICSM2011 @ Williamsburg, 2011-09-27
Real Example of Calculating ImpactScale
Copyright 2011 FUJITSU LABORATORIES LIMITED
DS1
#modules5.8k
Each square-shaped group of modules is a sub-system.
13-6
ICSM2011 @ Williamsburg, 2011-09-27
Real Example of Calculating ImpactScale
Copyright 2011 FUJITSU LABORATORIES LIMITED
DS1
#modules5.8k
Each square-shaped group of modules is a sub-system.
13-7
ICSM2011 @ Williamsburg, 2011-09-27 Copyright 2011 FUJITSU LABORATORIES LIMITED
Measurement ResultsDistribution of ImpactScale
Calculation Time DS1: about 10 sec. DS2: about 30 sec.
0
1000
2000
3000
4000
~50
~100
~150
~200
~250
~300
~350
~400
~450
~500
~550
~600
~650
~700
~750
~800
~850
~900
~950
Num
ber o
f Mod
ules
ImpactScale
Data Set Mean IS Max IS
DS1 86.0 2989.6DS2 156.5 3338.2
Spike:• system-wide dispatcher
or• symptom of bad smell
Long-tailed
Practically short
14
ICSM2011 @ Williamsburg, 2011-09-27 Copyright 2011 FUJITSU LABORATORIES LIMITED
ImpactScale and Faults
First 20% of modules contain
48.8% faults.
IS highly correlates with faults.
ImpactScale
High Low10-quartile
ModuleDatabase Table
15
ICSM2011 @ Williamsburg, 2011-09-27
Overview
1. Background and Goal2. Definition of ImpactScale3. Measuring ImpactScale in Real Systems4. Fault Prediction and Evaluation5. Summary
Copyright 2011 FUJITSU LABORATORIES LIMITED16
ICSM2011 @ Williamsburg, 2011-09-27
Overview of Evaluations Evaluation Procedure 100 times random sub-sampling validation
Evaluations Fault Prediction
• Predicting Faulty or Not Faulty• Effort-aware Fault Prediction
• Comparison between ImpactScale and Network Measures
Validating ImpactScale Definition
Copyright 2011 FUJITSU LABORATORIES LIMITED
Does adding ImpactScale to existing product metricsimprove predictive performance?
RQ3
RQ1
RQ2
17
ICSM2011 @ Williamsburg, 2011-09-27
Practitioners’ Point of View
Predicting Faulty or Not Faulty
Copyright 2011 FUJITSU LABORATORIES LIMITED
PerformanceMeasure
DS1MET
DS1MET+IS
Improvement by IS
Precision 0.148 0.168 +0.020Recall 0.315 0.392 +0.077
F1 0.200 0.234 +0.034
PerformanceMeasure
DS2MET
DS2MET+IS
Improvement by IS
Precision 0.139 0.162 +0.020Recall 0.253 0.334 +0.077
F1 0.177 0.216 +0.034
Practically, these Precision/Recall/F1 evaluations are not very useful. Because in maintenance, high fault-estimated modules tend to be large. Actually, in the case of DS2, the top 10% of high fault-estimated modules
has 24% LOC. It is not effort-effective.
Adding IS improves all performance measures supports RQ1 is YES.
All improvements are significant in Wilcoxon’s signed rank test.
Faults are predicted using logistic regression. MET = Model without ImpactScale / MET+IS = Model with ImpactScale
18
ICSM2011 @ Williamsburg, 2011-09-27
Effort-aware Fault Prediction Model Problem In maintenance, modules estimated as faulty tend to be large. A large module needs large effort to be reviewed or tested.
Practitioners’ Opinion “Budget and schedule are very demanding. We want to find more faults
with less effort.” Therefore, effort-effectiveness is our main concern.
We use “Effort-aware model” [Arisholm06] [Menzies10] [Mende10]
It prioritize modules in the order of relative riskto maximize effort-effectiveness.
Poisson Regression is used to learn relative risk.
Copyright 2011 FUJITSU LABORATORIES LIMITED
)()(#
xEffortxerrors
19
ICSM2011 @ Williamsburg, 2011-09-27
AUC is the Area Under the Curve of lift chart. AUC shows overall predictive performance. High AUC means high
performance.
ddr10 is “detected defect rate in first 10% effort”. ddr10 shows the predictive
performance in the limited effort. High ddr10 means high performance.
Practitioners’ Point of View
In maintenance, budget, schedule and effort is always limited, therefore, ddr10 is more important.
Results of Effort-aware Evaluation
Copyright 2011 FUJITSU LABORATORIES LIMITED
DS1-MET+ISDS1-MET
Optimal
Faul
ts d
etec
ted
Effort (LOC inspected)
《Effort-based Cumulative Lift Chart of DS1》
PerformanceMeasure
DS1-MET
DS1-MET+IS
Improvement by IS
AUC 0.635 0.680 +0.045ddr10 0.186 0.296 ×1.60
0.296
0.186
All improvements are significant in Wilcoxon’s signed rank test.
20-1
ICSM2011 @ Williamsburg, 2011-09-27
Results of Effort-aware Evaluation
Copyright 2011 FUJITSU LABORATORIES LIMITED
DS1-MET+ISDS1-MET
Optimal
Faul
ts d
etec
ted
Effort (LOC inspected)
DS2-MET+ISDS2-MET
Optimal
Effort (LOC inspected)
Faul
ts d
etec
ted
《Effort-based Cumulative Lift Chart of DS1》 《Effort-based Cumulative Lift Chart of DS2》
PerformanceMeasure
DS1-MET
DS1-MET+IS
Improvement by IS
AUC 0.635 0.680 +0.045ddr10 0.186 0.296 ×1.60
PerformanceMeasure
DS2-MET
DS2-MET+IS
Improvement by IS
AUC 0.669 0.714 +0.045ddr10 0.225 0.343 ×1.53
0.296
0.186
0.343
0.225
All improvements are significant in Wilcoxon’s signed rank test.
Does adding ImpactScale to existing product metrics improve predictive performance?
RQ1is YES.
20-2
ICSM2011 @ Williamsburg, 2011-09-27
Comparison with Network MeasuresNetwork Measures Recently, [Zimmermann et al. ICSE08] applied Social Network Analysis
(SNA) on a software dependency graph representing relationships between binary modules of software systems.
Over 50 network measures were used. For example,• in/out Degrees• Network Diameter• Closeness• Eigenvector Centrality, etc.
They and some replication studies [Tosun09][Nguyen10] reported they work well in some cases.
Copyright 2011 FUJITSU LABORATORIES LIMITED
“Does adding ImpactScale to existing product metrics and network measures improve predictive performance?”
RQ2
a.k.a. Page Rank
21
ICSM2011 @ Williamsburg, 2011-09-27
ImpactScale vs. Network MeasuresHierarchical Model Comparison based on Effort-aware Model
Copyright 2011 FUJITSU LABORATORIES LIMITED
Models are learned by using Principal Component Poisson Regression.
All improvements and deterioration are significant in Wilcoxon’s signed rank test.
*: P<0.05, **: P<0.01, unmarked: P<0.001
Model with existing metrics
+ImpactScale
+network measures
+network measures
+ImpactScale
Adding ImpactScale
improves performance.
22-1
ICSM2011 @ Williamsburg, 2011-09-27
ImpactScale vs. Network MeasuresHierarchical Model Comparison based on Effort-aware Model
Copyright 2011 FUJITSU LABORATORIES LIMITED
Models are learned by using Principal Component Poisson Regression.
All improvements and deterioration are significant in Wilcoxon’s signed rank test.
*: P<0.05, **: P<0.01, unmarked: P<0.001
Model with existing metrics
+ImpactScale
+network measures
+network measures
+ImpactScale
Adding ImpactScale
improves performance.
“Does adding ImpactScale to existing product metrics and network measures improve predictive performance?”
RQ2is YES.
22-2
ICSM2011 @ Williamsburg, 2011-09-27 Copyright 2011 FUJITSU LABORATORIES LIMITED
Validating ImpactScale
《Test method》 Compare Models with ImpactScale variants with limited maximum distance of path-finding.
0.10
0.15
0.20
0.25
0.30
0.35
0.40
1 2 3 4 5 6 7 8 9 100.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
1 2 3 4 5 6 7 8 9 10
ddr10
Limit of Maximum Distance of Path-finding
DS1 DS2
ddr10
“Limit=1” variant means
almostfan-in + fan-out.
Is considering distant nodes meaningful?RQ3
YES. Answer
23
ICSM2011 @ Williamsburg, 2011-09-27
Overview
1. Background and Goal2. Definition of ImpactScale3. Measuring ImpactScale in Real Systems4. Fault Prediction and Evaluation5. Summary
Copyright 2011 FUJITSU LABORATORIES LIMITED24
ICSM2011 @ Williamsburg, 2011-09-27
Summary of Evaluations
Copyright 2011 FUJITSU LABORATORIES LIMITED
“Does adding ImpactScale to existing product metrics and network measures improve predictive performance?”
RQ2YES
Does adding ImpactScale to existing product metricsimprove predictive performance?
RQ1
Is considering distant nodes meaningful?RQ3 YES
A metric that quantifies the scale of change impact can improve the performance of fault prediction.
HypothesisTRUE
YES
YES
25
ICSM2011 @ Williamsburg, 2011-09-27
Threats to Validity Language ImpactScale has no language-specific feature, but the evaluations are
done in only COBOL systems. COBOL has a lot of difference from other languages.
Application Domain The evaluated systems are only in accounting business domain.
Call Graph Analysis The impact of dynamic dispatching (e.g. polymorphism and reflection) is
not assessed.
Copyright 2011 FUJITSU LABORATORIES LIMITED26
ICSM2011 @ Williamsburg, 2011-09-27 Copyright 2011 FUJITSU LABORATORIES LIMITED
ConclusionWe defined a new product metric quantifying change impact,
called ImpactScale. Probabilistic propagation Relation-sensitive propagation Practical computational time even for large-scale software systems
We evaluated its predictive performance in enterprise systems. Adding ImpactScale improves the performance
• Over 1.5 times in first 10% effort (LOC). Additional Finding
• Considering distant nodes in dependency graph is meaningful for fault prediction.
27
ICSM2011 @ Williamsburg, 2011-09-27 Copyright 2011 FUJITSU LABORATORIES LIMITED
Future Works Extending supported languages Java, C, C++
Expanding use cases Rapid risk assessmentWatching violations of modularityMeasuring software decay
28
ICSM2011 @ Williamsburg, 2011-09-27 Copyright 2011 FUJITSU LABORATORIES LIMITED
Thank you!Kenichi Kobayashi
Fujitsu Labs
29
ICSM2011 @ Williamsburg, 2011-09-27