impact analysis - impactscale: quantifying change impact to predict faults in large software systems

40
ImpactScale: Quantifying Change Impact to Predict Faults in Large Software Systems Kenichi Kobayashi Fujitsu Laboratories Akihiko Matsuo Fujitsu Laboratories Manabu Kamimura Fujitsu Laboratories Toshiaki Yoshino Fujitsu Yasuhiro Hayase University of Tsukuba Katsuro Inoue Osaka University

Upload: icsm-2011

Post on 13-Jan-2015

645 views

Category:

Technology


0 download

DESCRIPTION

Paper: ImpactScale: Quantifying Change Impact to Predict Faults in Large Software SystemsAuthors: Kenichi Kobayashi, Akihiko Matsuo, Katsuro Inoue, Yasuhiro Hayase, Manabu Kamimura and Toshiaki Yoshino Session: Research Track 2: Impact Analysis

TRANSCRIPT

Page 1: Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in Large Software Systems

ImpactScale: Quantifying Change Impact to Predict Faults in Large Software Systems

Kenichi KobayashiFujitsu Laboratories

Akihiko Matsuo Fujitsu Laboratories

Manabu KamimuraFujitsu Laboratories

Toshiaki YoshinoFujitsu

Yasuhiro HayaseUniversity of Tsukuba

Katsuro InoueOsaka University

Page 2: Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in Large Software Systems

ICSM2011 @ Williamsburg, 2011-09-27

Overview

1. Background and Goal2. Definition of ImpactScale3. Measuring ImpactScale in Real Systems4. Fault Prediction and Evaluation5. Summary

Copyright 2011 FUJITSU LABORATORIES LIMITED1

Page 3: Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in Large Software Systems

ICSM2011 @ Williamsburg, 2011-09-27

Practitioners’ Point of View

Copyright 2011 FUJITSU LABORATORIES LIMITED

Background Fault prediction in maintenance is a difficult task, and

predictive performance is not enough only with product metrics. Product Metrics are metrics extracted from software product such as

source code.

Therefore, process metrics, such as code churn and logical coupling, have been combined to product metrics. Process Metrics are metrics extracted from software process such as

change histories.

However, in enterprise scenes of maintenance, documents, change histories, bug reports, and specialists’ knowledge are often lost, out-of-date, or unable to be used.

2

Page 4: Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in Large Software Systems

ICSM2011 @ Williamsburg, 2011-09-27 Copyright 2011 FUJITSU LABORATORIES LIMITED

Goals

Problem Process metrics cannot be always obtained.

Motivation To achieve high predictive performance only with product

metrics extractable from source code

Goals To define a new product metric To show the effectiveness of the metric

3

Page 5: Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in Large Software Systems

ICSM2011 @ Williamsburg, 2011-09-27

Need not to solve the affected areas.Only need to solve the scale of them.

We assumed Change Impact Analysis enables us to extract implicit dependency.

WeaknessHigh computational cost

Copyright 2011 FUJITSU LABORATORIES LIMITED

Basic IdeaSoftware dependency is one of surviving factors of faults even after release.

修正忘れ

暗黙の依存関係Change Impact AnalysisTechnique to solve the affected areas when some part of software is changed.

ImpactScale(abbrev. IS)

fix

fix

fix

implicit dependency

missed fix

A metric that quantifies the scale of change impact can improve the performance of fault prediction.

Hypothesis

4

Page 6: Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in Large Software Systems

ICSM2011 @ Williamsburg, 2011-09-27

Overview

1. Background and Goal2. Definition of ImpactScale3. Measuring ImpactScale in Real Systems4. Fault Prediction and Evaluation5. Summary

Copyright 2011 FUJITSU LABORATORIES LIMITED5

Page 7: Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in Large Software Systems

ICSM2011 @ Williamsburg, 2011-09-27

Dependency

Dependencies are extract from target software, and Propagation Graph is built.

Propagation Model Probabilistic propagation Relation-sensitive propagation

ImpactScale is sum of all Quantities of Change Impact.

Copyright 2011 FUJITSU LABORATORIES LIMITED

Overview of ImpactScale DefinitionPropagation Graph

Code Node

Data Node

Quantity of Change Impact

from C to A

Change!Change!

6

Page 8: Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in Large Software Systems

ICSM2011 @ Williamsburg, 2011-09-27 Copyright 2011 FUJITSU LABORATORIES LIMITED

Propagation Graph① Build dependency graph

extracted from target software

《Dependency Graph》 Code Nodemodule, class,

function, source code

Data NodeDB table,

global variable

Dependency Edgewith relation typeCALL, READ, WRITE

7-1

Page 9: Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in Large Software Systems

ICSM2011 @ Williamsburg, 2011-09-27 Copyright 2011 FUJITSU LABORATORIES LIMITED

Propagation Graph① Build dependency graph

extracted from target software

《Dependency Graph》《Propagation Graph》

② Add reverse edges to build Propagation Graph

Change impact analysis for ImpactScale is performed on Propagation Graph.

Code Nodemodule, class,

function, source code

Data NodeDB table,

global variable

Dependency Edgewith relation typeCALL, READ, WRITE

7-2

Page 10: Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in Large Software Systems

ICSM2011 @ Williamsburg, 2011-09-27 Copyright 2011 FUJITSU LABORATORIES LIMITED

Probabilistic PropagationWe assume that change impact probabilistically propagates

from a node to another node as some Ripple Effect studies. [Hanny72] [Tsantalis05] [Sharafat07]

In this presentation, propagation probability is always 0.5.

×0.5

×0.5×0.5

Quantity of change impact

from the source node

Propagation Probability

Propagation Probability

Change!Change!

8

Page 11: Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in Large Software Systems

ICSM2011 @ Williamsburg, 2011-09-27

To avoid overestimation, we used context information to eliminate unlikely propagation.We use an edge’s relation type as minimal context information in

point of computational time.

Cut Rules determine whether propagation from one node to its next node is cut or not, referring its previous and next edge’s relation type.

We call such controlled propagation relation-sensitive propagation.

Computational complexity is practically low.Copyright 2011 FUJITSU LABORATORIES LIMITED

Relation-sensitive Propagation

currentnode

nextrelation type

previous relation type next

nodeCut Rulerefer

refer

9

Page 12: Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in Large Software Systems

ICSM2011 @ Williamsburg, 2011-09-27

Cut Rule 2During finding callers,

don’t find callees.

Example of Cut Rules

Copyright 2011 FUJITSU LABORATORIES LIMITED

Change!Change! Change!Change!

Cut Rule 1During finding callees,

don’t find callers.

Cut Rule 3Don’t find beyond

READ edges.

Example from “C” Example from “F”

10

Page 13: Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in Large Software Systems

ICSM2011 @ Williamsburg, 2011-09-27

Overview

1. Background and Goal2. Definition of ImpactScale3. Measuring ImpactScale in Real Systems4. Fault Prediction and Evaluation5. Summary

Copyright 2011 FUJITSU LABORATORIES LIMITED11

Page 14: Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in Large Software Systems

ICSM2011 @ Williamsburg, 2011-09-27 Copyright 2011 FUJITSU LABORATORIES LIMITED

Two enterprise accounting systems in different companies

Data Sets for Evaluations

Data Set Name #Modules Total LOC #Faults #Faulty

Modules

Faulty Module

Rate

Term Fault-

Collected

DS1 5.8k 1.6M 269 215 3.7% 40 months

DS2 7.6k 3.7M 250 208 2.7% 40 months

Common Properties Language: COBOL Age: Over 20 years

Collected Metrics 7 Existing Metrics

LOC, WMC, MaxVG, Sections, Calls, Fan-in, Fan-out

ImpactScale

12

Page 15: Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in Large Software Systems

ICSM2011 @ Williamsburg, 2011-09-27

Real Example of Calculating ImpactScale

Copyright 2011 FUJITSU LABORATORIES LIMITED

DS1

#modules5.8k

Each square-shaped group of modules is a sub-system.

13-1

Page 16: Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in Large Software Systems

ICSM2011 @ Williamsburg, 2011-09-27

Real Example of Calculating ImpactScale

Copyright 2011 FUJITSU LABORATORIES LIMITED

DS1

#modules5.8k

Each square-shaped group of modules is a sub-system.

13-2

Page 17: Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in Large Software Systems

ICSM2011 @ Williamsburg, 2011-09-27

Real Example of Calculating ImpactScale

Copyright 2011 FUJITSU LABORATORIES LIMITED

DS1

#modules5.8k

Each square-shaped group of modules is a sub-system.

13-3

Page 18: Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in Large Software Systems

ICSM2011 @ Williamsburg, 2011-09-27

Real Example of Calculating ImpactScale

Copyright 2011 FUJITSU LABORATORIES LIMITED

DS1

#modules5.8k

Each square-shaped group of modules is a sub-system.

13-4

Page 19: Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in Large Software Systems

ICSM2011 @ Williamsburg, 2011-09-27

Real Example of Calculating ImpactScale

Copyright 2011 FUJITSU LABORATORIES LIMITED

DS1

#modules5.8k

Each square-shaped group of modules is a sub-system.

13-5

Page 20: Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in Large Software Systems

ICSM2011 @ Williamsburg, 2011-09-27

Real Example of Calculating ImpactScale

Copyright 2011 FUJITSU LABORATORIES LIMITED

DS1

#modules5.8k

Each square-shaped group of modules is a sub-system.

13-6

Page 21: Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in Large Software Systems

ICSM2011 @ Williamsburg, 2011-09-27

Real Example of Calculating ImpactScale

Copyright 2011 FUJITSU LABORATORIES LIMITED

DS1

#modules5.8k

Each square-shaped group of modules is a sub-system.

13-7

Page 22: Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in Large Software Systems

ICSM2011 @ Williamsburg, 2011-09-27 Copyright 2011 FUJITSU LABORATORIES LIMITED

Measurement ResultsDistribution of ImpactScale

Calculation Time DS1: about 10 sec. DS2: about 30 sec.

0

1000

2000

3000

4000

~50

~100

~150

~200

~250

~300

~350

~400

~450

~500

~550

~600

~650

~700

~750

~800

~850

~900

~950

Num

ber o

f Mod

ules

ImpactScale

Data Set Mean IS Max IS

DS1 86.0 2989.6DS2 156.5 3338.2

Spike:• system-wide dispatcher

or• symptom of bad smell

Long-tailed

Practically short

14

Page 23: Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in Large Software Systems

ICSM2011 @ Williamsburg, 2011-09-27 Copyright 2011 FUJITSU LABORATORIES LIMITED

ImpactScale and Faults

First 20% of modules contain

48.8% faults.

IS highly correlates with faults.

ImpactScale

High Low10-quartile

ModuleDatabase Table

15

Page 24: Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in Large Software Systems

ICSM2011 @ Williamsburg, 2011-09-27

Overview

1. Background and Goal2. Definition of ImpactScale3. Measuring ImpactScale in Real Systems4. Fault Prediction and Evaluation5. Summary

Copyright 2011 FUJITSU LABORATORIES LIMITED16

Page 25: Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in Large Software Systems

ICSM2011 @ Williamsburg, 2011-09-27

Overview of Evaluations Evaluation Procedure 100 times random sub-sampling validation

Evaluations Fault Prediction

• Predicting Faulty or Not Faulty• Effort-aware Fault Prediction

• Comparison between ImpactScale and Network Measures

Validating ImpactScale Definition

Copyright 2011 FUJITSU LABORATORIES LIMITED

Does adding ImpactScale to existing product metricsimprove predictive performance?

RQ3

RQ1

RQ2

17

Page 26: Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in Large Software Systems

ICSM2011 @ Williamsburg, 2011-09-27

Practitioners’ Point of View

Predicting Faulty or Not Faulty

Copyright 2011 FUJITSU LABORATORIES LIMITED

PerformanceMeasure

DS1MET

DS1MET+IS

Improvement by IS

Precision 0.148 0.168 +0.020Recall 0.315 0.392 +0.077

F1 0.200 0.234 +0.034

PerformanceMeasure

DS2MET

DS2MET+IS

Improvement by IS

Precision 0.139 0.162 +0.020Recall 0.253 0.334 +0.077

F1 0.177 0.216 +0.034

Practically, these Precision/Recall/F1 evaluations are not very useful. Because in maintenance, high fault-estimated modules tend to be large. Actually, in the case of DS2, the top 10% of high fault-estimated modules

has 24% LOC. It is not effort-effective.

Adding IS improves all performance measures supports RQ1 is YES.

All improvements are significant in Wilcoxon’s signed rank test.

Faults are predicted using logistic regression. MET = Model without ImpactScale / MET+IS = Model with ImpactScale

18

Page 27: Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in Large Software Systems

ICSM2011 @ Williamsburg, 2011-09-27

Effort-aware Fault Prediction Model Problem In maintenance, modules estimated as faulty tend to be large. A large module needs large effort to be reviewed or tested.

Practitioners’ Opinion “Budget and schedule are very demanding. We want to find more faults

with less effort.” Therefore, effort-effectiveness is our main concern.

We use “Effort-aware model” [Arisholm06] [Menzies10] [Mende10]

It prioritize modules in the order of relative riskto maximize effort-effectiveness.

Poisson Regression is used to learn relative risk.

Copyright 2011 FUJITSU LABORATORIES LIMITED

)()(#

xEffortxerrors

19

Page 28: Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in Large Software Systems

ICSM2011 @ Williamsburg, 2011-09-27

AUC is the Area Under the Curve of lift chart. AUC shows overall predictive performance. High AUC means high

performance.

ddr10 is “detected defect rate in first 10% effort”. ddr10 shows the predictive

performance in the limited effort. High ddr10 means high performance.

Practitioners’ Point of View

In maintenance, budget, schedule and effort is always limited, therefore, ddr10 is more important.

Results of Effort-aware Evaluation

Copyright 2011 FUJITSU LABORATORIES LIMITED

DS1-MET+ISDS1-MET

Optimal

Faul

ts d

etec

ted

Effort (LOC inspected)

《Effort-based Cumulative Lift Chart of DS1》

PerformanceMeasure

DS1-MET

DS1-MET+IS

Improvement by IS

AUC 0.635 0.680 +0.045ddr10 0.186 0.296 ×1.60

0.296

0.186

All improvements are significant in Wilcoxon’s signed rank test.

20-1

Page 29: Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in Large Software Systems

ICSM2011 @ Williamsburg, 2011-09-27

Results of Effort-aware Evaluation

Copyright 2011 FUJITSU LABORATORIES LIMITED

DS1-MET+ISDS1-MET

Optimal

Faul

ts d

etec

ted

Effort (LOC inspected)

DS2-MET+ISDS2-MET

Optimal

Effort (LOC inspected)

Faul

ts d

etec

ted

《Effort-based Cumulative Lift Chart of DS1》 《Effort-based Cumulative Lift Chart of DS2》

PerformanceMeasure

DS1-MET

DS1-MET+IS

Improvement by IS

AUC 0.635 0.680 +0.045ddr10 0.186 0.296 ×1.60

PerformanceMeasure

DS2-MET

DS2-MET+IS

Improvement by IS

AUC 0.669 0.714 +0.045ddr10 0.225 0.343 ×1.53

0.296

0.186

0.343

0.225

All improvements are significant in Wilcoxon’s signed rank test.

Does adding ImpactScale to existing product metrics improve predictive performance?

RQ1is YES.

20-2

Page 30: Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in Large Software Systems

ICSM2011 @ Williamsburg, 2011-09-27

Comparison with Network MeasuresNetwork Measures Recently, [Zimmermann et al. ICSE08] applied Social Network Analysis

(SNA) on a software dependency graph representing relationships between binary modules of software systems.

Over 50 network measures were used. For example,• in/out Degrees• Network Diameter• Closeness• Eigenvector Centrality, etc.

They and some replication studies [Tosun09][Nguyen10] reported they work well in some cases.

Copyright 2011 FUJITSU LABORATORIES LIMITED

“Does adding ImpactScale to existing product metrics and network measures improve predictive performance?”

RQ2

a.k.a. Page Rank

21

Page 31: Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in Large Software Systems

ICSM2011 @ Williamsburg, 2011-09-27

ImpactScale vs. Network MeasuresHierarchical Model Comparison based on Effort-aware Model

Copyright 2011 FUJITSU LABORATORIES LIMITED

Models are learned by using Principal Component Poisson Regression.

All improvements and deterioration are significant in Wilcoxon’s signed rank test.

*: P<0.05, **: P<0.01, unmarked: P<0.001

Model with existing metrics

+ImpactScale

+network measures

+network measures

+ImpactScale

Adding ImpactScale

improves performance.

22-1

Page 32: Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in Large Software Systems

ICSM2011 @ Williamsburg, 2011-09-27

ImpactScale vs. Network MeasuresHierarchical Model Comparison based on Effort-aware Model

Copyright 2011 FUJITSU LABORATORIES LIMITED

Models are learned by using Principal Component Poisson Regression.

All improvements and deterioration are significant in Wilcoxon’s signed rank test.

*: P<0.05, **: P<0.01, unmarked: P<0.001

Model with existing metrics

+ImpactScale

+network measures

+network measures

+ImpactScale

Adding ImpactScale

improves performance.

“Does adding ImpactScale to existing product metrics and network measures improve predictive performance?”

RQ2is YES.

22-2

Page 33: Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in Large Software Systems

ICSM2011 @ Williamsburg, 2011-09-27 Copyright 2011 FUJITSU LABORATORIES LIMITED

Validating ImpactScale

《Test method》 Compare Models with ImpactScale variants with limited maximum distance of path-finding.

0.10

0.15

0.20

0.25

0.30

0.35

0.40

1 2 3 4 5 6 7 8 9 100.10

0.15

0.20

0.25

0.30

0.35

0.40

0.45

1 2 3 4 5 6 7 8 9 10

ddr10

Limit of Maximum Distance of Path-finding

DS1 DS2

ddr10

“Limit=1” variant means

almostfan-in + fan-out.

Is considering distant nodes meaningful?RQ3

YES. Answer

23

Page 34: Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in Large Software Systems

ICSM2011 @ Williamsburg, 2011-09-27

Overview

1. Background and Goal2. Definition of ImpactScale3. Measuring ImpactScale in Real Systems4. Fault Prediction and Evaluation5. Summary

Copyright 2011 FUJITSU LABORATORIES LIMITED24

Page 35: Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in Large Software Systems

ICSM2011 @ Williamsburg, 2011-09-27

Summary of Evaluations

Copyright 2011 FUJITSU LABORATORIES LIMITED

“Does adding ImpactScale to existing product metrics and network measures improve predictive performance?”

RQ2YES

Does adding ImpactScale to existing product metricsimprove predictive performance?

RQ1

Is considering distant nodes meaningful?RQ3 YES

A metric that quantifies the scale of change impact can improve the performance of fault prediction.

HypothesisTRUE

YES

YES

25

Page 36: Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in Large Software Systems

ICSM2011 @ Williamsburg, 2011-09-27

Threats to Validity Language ImpactScale has no language-specific feature, but the evaluations are

done in only COBOL systems. COBOL has a lot of difference from other languages.

Application Domain The evaluated systems are only in accounting business domain.

Call Graph Analysis The impact of dynamic dispatching (e.g. polymorphism and reflection) is

not assessed.

Copyright 2011 FUJITSU LABORATORIES LIMITED26

Page 37: Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in Large Software Systems

ICSM2011 @ Williamsburg, 2011-09-27 Copyright 2011 FUJITSU LABORATORIES LIMITED

ConclusionWe defined a new product metric quantifying change impact,

called ImpactScale. Probabilistic propagation Relation-sensitive propagation Practical computational time even for large-scale software systems

We evaluated its predictive performance in enterprise systems. Adding ImpactScale improves the performance

• Over 1.5 times in first 10% effort (LOC). Additional Finding

• Considering distant nodes in dependency graph is meaningful for fault prediction.

27

Page 38: Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in Large Software Systems

ICSM2011 @ Williamsburg, 2011-09-27 Copyright 2011 FUJITSU LABORATORIES LIMITED

Future Works Extending supported languages Java, C, C++

Expanding use cases Rapid risk assessmentWatching violations of modularityMeasuring software decay

28

Page 39: Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in Large Software Systems

ICSM2011 @ Williamsburg, 2011-09-27 Copyright 2011 FUJITSU LABORATORIES LIMITED

Thank you!Kenichi Kobayashi

Fujitsu Labs

29

Page 40: Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in Large Software Systems

ICSM2011 @ Williamsburg, 2011-09-27