software cost estimation strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

71
Software Cost Software Cost Estimation Estimation Strictly speaking effort! 강강강강강 강강강강강강 강 강 강

Upload: jeremy-houston

Post on 03-Jan-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Software Cost Software Cost EstimationEstimation

Strictly speaking effort!

강릉대학교 컴퓨터공학과권 기 태

Page 2: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 22: : 23年 4月 20日23年 4月 20日

AgendAgendaa

1. Background

2. “Current” techniques

3. Machine learning techniques

4. Assessing prediction systems

5. Future avenues

Page 3: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 33: : 23年 4月 20日23年 4月 20日

1. 1. BackgroundBackground

Scope:

software projects

early estimates

effort ≠ cost

estimate ≠ expected answer

Page 4: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 44: : 23年 4月 20日23年 4月 20日

What the Papers What the Papers Say...Say...

From Computing, 26 November 1998:

Defence system never worked

MoD project loses £34m The Ministry of Defence has been forced to write off £34.6 million on an IT project it commissioned in 1988 and abandoned eight years later, writes Joanne Wallen. The Trawlerman system, designed ...

Page 5: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 55: : 23年 4月 20日23年 4月 20日

The The ProblemProblemSoftware developers need to predict, e.g.

effort, duration, number of features

defects and reliability

But ...

little systematic data

noise and change

complex interactions between variables

poorly understood phenomena

Page 6: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 66: : 23年 4月 20日23年 4月 20日

So What is an So What is an Estimate?Estimate?

An estimate is a prediction based upon probabilistic assessment.

p

effort0

most likely

equal probability of under / over estimate

Page 7: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 77: : 23年 4月 20日23年 4月 20日

Some Causes of Poor Some Causes of Poor EstimationEstimation

We don’t cope with political

problems that hamper the

process.

We don’t develop estimating

expertise.

We don’t systematically use

past experience.Tom DeMarcoControlling Software Projects. Management, Measurement and Estimation. Yourdon Press: NY, 1982.

Page 8: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 88: : 23年 4月 20日23年 4月 20日

2. “2. “Current” Current” TechniquesTechniques

Essentially a software cost estimation system is an input vector mapped to an output.

expert judgement

COCOMO

function points

DIY models

Barry Boehm“Software Engineering Economics,” IEEE Transactions on Software Engineering, vol. 10, pp. 4-21, 1984.

Page 9: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 99: : 23年 4月 20日23年 4月 20日

2.1 2.1 Expert Expert JudgementJudgement

Most widely used estimation technique

No consistently “best” prediction system

Lack of historical data

Need to “own” the estimate

Experts plus … ?

Page 10: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 1010: : 23年 4月 20日23年 4月 20日

Expert Judgement Expert Judgement DrawbacksDrawbacks

BUT Lack of objectivity

Lack of repeatability

Lack of recall /awareness

Lack of experts!

Preferable to use more than one expert.

Preferable to use more than one expert.

Page 11: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 1111: : 23年 4月 20日23年 4月 20日

What Do We Know About What Do We Know About Experts?Experts?

Most commonly practised technique.

Dutch survey revealed 62% of estimators used intuition supplemented by remembered analogies.

UK survey - time to estimate ranged from 5 minutes to 4 weeks.

US survey found that the only factor with a significant positive relationship with accuracy was responsibility.

Page 12: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 1212: : 23年 4月 20日23年 4月 20日

Information Information UsedUsed

Design requirements

Resources available

Base product/source code (enhancement projects)

Software tools available

Previous history of product

...

Page 13: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 1313: : 23年 4月 20日23年 4月 20日

Information Information NeededNeeded

Rules of thumb

Available resources

Data on past projects

Feedback on past estimates

...

Page 14: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 1414: : 23年 4月 20日23年 4月 20日

Delphi Delphi Techniques?Techniques?

Methods forstructuring group communication processes

tosolve complex problems.

Characterised byiterationanonymity

Devised by Rand Corporation (1948). Refined by Boehm (1981).

Page 15: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 1515: : 23年 4月 20日23年 4月 20日

Stages for Delphi Stages for Delphi ApproachApproach

1. Experts receive spec + estimation form

2. Discussion of product + estimation issues

3. Experts produce individual estimate

4. Estimates tabulated and returned to experts

5. Only expert's personal estimate identified

6. Experts meet to discuss results

7. Estimates are revised

8. Cycle continues until an acceptable degree of convergence is obtained

Page 16: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 1616: : 23年 4月 20日23年 4月 20日

Wideband Delphi Wideband Delphi FormForm

Project: X134 Date: 9/17/03

Estimator: Hyolee

Estimation round: 1

0 10 20 30 40 50

x x* x x! x x x

Key: x = estimate; x* = your estimate; x! = median estimate

Page 17: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 1717: : 23年 4月 20日23年 4月 20日

Observing Delphi Observing Delphi GroupsGroups

Four groups of MSc student

Developing a C++ prototype for some simple scenarios

Requested to estimate size of prototype (number of delimiters)

Initial estimates followed by 2 group discussions

Recorded group discussions plus scribes

Page 18: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 1818: : 23年 4月 20日23年 4月 20日

Delphi Size Estimation Delphi Size Estimation ResultsResults

Estimation Mean Median Min Max

Initial 371 160.5 23 2249Round 1 219 40 23 749Round 2 271 40 3 949

Absolute errors

Page 19: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 1919: : 23年 4月 20日23年 4月 20日

Converging Converging GroupGroup

0

50

100

150

200

250

300

350

400

450

Initial Size Round1 Size Round2 Size

Series1Series2Series3

true size

Page 20: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 2020: : 23年 4月 20日23年 4月 20日

A Dominant A Dominant IndividualIndividual

0

500

1000

1500

2000

2500

3000

Initial Size Round1 Size Round2 Size

Series1Series2Series3

true size

Page 21: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 2121: : 23年 4月 20日23年 4月 20日

2.2 2.2 COCOMOCOCOMO

Best known example of an algorithmic cost model. Series of three models: basic, intermediate and detailed.

Models assume relationships between: size (KDSI) and effort effort and elapsed time

MMa.KDSIb

TDEVc.MMd

http://sunset.usc.edu/COCOMOII/cocomo.html

Barry Boehm“Software Engineering Economics,” IEEE Transactions on Software Engineering, vol. 10, pp. 4-21, 1984.

Page 22: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 2222: : 23年 4月 20日23年 4月 20日

COCOMO COCOMO contd.contd.

Model coefficients are dependant upon the type of project:

organic: small teams, familiar application

semi-detached embedded: complex organisation,

software and/or hardware interactions

Page 23: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 2323: : 23年 4月 20日23年 4月 20日

COCOMO Cost COCOMO Cost DriversDrivers

• product attributes• computer attributes• personnel attributes• project attributes

Drivers hard to empirically validate.Many are inappropriate for 1990's e.g. database size.Drivers not independent e.g. MODP and TOOL.

Drivers hard to empirically validate.Many are inappropriate for 1990's e.g. database size.Drivers not independent e.g. MODP and TOOL.

Page 24: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 2424: : 23年 4月 20日23年 4月 20日

COCOMO COCOMO AssessmentAssessment

Very influential, non-proprietory model.

Drivers help the manager understand the impact of different factors upon project costs.

Hard to port to different development environments without extensive re-calibration.

Vulnerable to mis-classification of development type

Hard to estimate KDSI at the start of a project.

Page 25: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 2525: : 23年 4月 20日23年 4月 20日

2.3 2.3 What are Function What are Function Points?Points?

A synthetic (indirect) measure derived from

a software requirements specification of

the attribute functionality.

This conforms closely to our notion of

specification size.

Uses:

effort prediction

productivity

Page 26: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 2626: : 23年 4月 20日23年 4月 20日

Function Points (a brief Function Points (a brief history)history)

Albrecht developed FPs in mid 1970's at IBM.

Measure of system functionality as opposed to size.

Weighted count of function types derived from specification:

interfaces

inquiries

inputs / outputs

files

A. Albrecht and J. Gaffney, “Software function, source lines of code, and development effort prediction: a software science validation,” IEEE Transactions on Software Engineering, vol. 9, pp. 639-648, 1983.C. Symons, “Function Point Analysis: Difficulties and Improvements,” IEEE Transactions on Software Engineering, vol. 14, pp. 2-11, 1988.

A. Albrecht and J. Gaffney, “Software function, source lines of code, and development effort prediction: a software science validation,” IEEE Transactions on Software Engineering, vol. 9, pp. 639-648, 1983.C. Symons, “Function Point Analysis: Difficulties and Improvements,” IEEE Transactions on Software Engineering, vol. 14, pp. 2-11, 1988.

Page 27: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 2727: : 23年 4月 20日23年 4月 20日

Function Point Function Point RulesRules

Weighted count of different types of functions:external input types (4) e.g. file names

external output types (5) e.g. reports, msgs.

inquiries (4) i.e. interactive inputs needing a response

external files (7) i.e. files shared with other software systems

internal files (10) i.e. invisible outside system

The unadjusted count (UFC) is the weighted sum

of the count of each type of function.

Page 28: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 2828: : 23年 4月 20日23年 4月 20日

Function Function TypesTypes

Type Simple Average Complex

External input 3 4 6

External output 4 5 7

Logical int. file 7 10 15

Ext. interface 5 7 10

Ext. inquiry 3 4 6

Page 29: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 2929: : 23年 4月 20日23年 4月 20日

Adjusted Adjusted FPsFPs

14 factors contribute to the technical complexity factor (TCF), e.g. performance, on-line update, complex interface.

Each factor is rated 0 (n.a.) - 5 (essential).

TCF = 0.65 + (sum of factors)/100

Thus TCF may range from 0.65 to 1.35, and

FP = UFC*TCF

Page 30: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 3030: : 23年 4月 20日23年 4月 20日

Technical Complexity Technical Complexity FactorsFactors

Data communicationsDistributed functionsPerformanceHeavily used configurationTransaction rateOnline data entryEnd user efficiency

Online updateComplex processingReusabilityInstallation easeOperational easeMultiple sitesFacilities change

Page 31: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 3131: : 23年 4月 20日23年 4月 20日

Function Points and Function Points and LOCLOC

Language LOC per FPAssembler 320C 150 (128)COBOL 106 (105)Modula-2 71 (80)4GL 40 (20)Query languages 16 (13)Spreadsheet 6

Behrens (1983), IEEE TSE 9(6).C. Jones “Applied Software Measurement, McGraw-Hill (1991)

Page 32: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 3232: : 23年 4月 20日23年 4月 20日

FP Based FP Based PredictionsPredictions

Simplest form is:

effort = FC + p * FP

Need to determine local productivity, p and fixed costs, FC.

10000

20000

30000

40000

500 1000 1500 2000

FP

EFFORT

Effort v FPs at XYZ Bank

Page 33: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 3333: : 23年 4月 20日23年 4月 20日

All environments are not All environments are not equal equal

Productivity figures in FPs per 1000

hours:

IBM 29.6

Finnish 99.5

Canada 58.9

Mermaid 37.0

US 28.5

trainingpersonnelmanagementtechniquestoolsapplicationsetc.

Page 34: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 3434: : 23年 4月 20日23年 4月 20日

Function Point Function Point UsersUsers

Widely used, (e.g. government, financial organisations) with some success: monitor team productivity cost estimation

Most effective where homogeneous environment

Variants include Mk II Points and Feature Points

Page 35: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 3535: : 23年 4月 20日23年 4月 20日

Function Point Function Point WeaknessesWeaknesses Subjective counting (Low and Jeffery report

30% variation between different analysts). Hard to automate. Hard to apply to maintenance work. Not based upon organisational needs, e.g. is

it productive to produce functions irrelevant to the user?

Oriented to traditional DP type applications. Hard to calibrate.

Frequently leads to inaccurate prediction systems.

Page 36: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 3636: : 23年 4月 20日23年 4月 20日

Function Point Function Point StrengthsStrengths

The necessary data can be available early on in a project.

Language independent.

Layout independent (unlike LOC)

More accurate than estimated LOC?

What is the alternative?

Page 37: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 3737: : 23年 4月 20日23年 4月 20日

2.4 2.4 DIY DIY modelsmodels

250

500

750

1000

75 150 225

FILES

ACT

Predicting effort using number of files

Page 38: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 3838: : 23年 4月 20日23年 4月 20日

To introduce an economies or diseconomies of scale exponent:

effort = p * Se

where 0<e.

An empirical study of 60 projects at IBM Federal Systems Division during the mid 1970s concluded that effort could be modelled as:

effort (PM) = 5.2 * KLOC0.91

A Non-linear A Non-linear ModelModel

Page 39: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 3939: : 23年 4月 20日23年 4月 20日

Productivity and Productivity and SizeSize

Effort (PM) Size (KLOC) KLOC/PM

42.27 10 0.24

79.42 20 0.25

182.84 50 0.27

343.56 100 0.29

2792.57 1000 0.36

Productivity and Project Size using the Walston and Felix Model

Page 40: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 4040: : 23年 4月 20日23年 4月 20日

Productivity v Productivity v SizeSize

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0 200 400 600 800 1000 1200

KLOC

Page 41: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 4141: : 23年 4月 20日23年 4月 20日

Bespoke is Bespoke is Better!Better!

Model Researcher MMRE

Basic COCOMO Kemerer 601%

FP Kemerer 103%

SLIM Kemerer 772%

ESTIMACS Kemerer 85%

COCOMO Miyazaki & Mori 166%

Intermediate COCOMO Kitchenham 255%

Page 42: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 4242: : 23年 4月 20日23年 4月 20日

So Where Are So Where Are We?We?

• A major research topic.

• Poor results “off the shelf”.

• Accuracy improves with calibration but still mixed.

• Needs accurate, (largely) quantitative inputs.

Page 43: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 4343: : 23年 4月 20日23年 4月 20日

3. 3. Machine Learning Machine Learning TechniquesTechniques

A new area but demonstrating promise.

System “learns” how to estimate from a training set.

Doesn’t assume a continuous functional relationship.

In theory more robust against outliers, more flexible types of relationship.

Du Zhang and Jeffrey Tsai, “Machine Learning and Software Engineering,” Software Quality Journal, vol. 11, pp. 87-119, 2003.

Du Zhang and Jeffrey Tsai, “Machine Learning and Software Engineering,” Software Quality Journal, vol. 11, pp. 87-119, 2003.

Page 44: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 4444: : 23年 4月 20日23年 4月 20日

Different ML Different ML TechniquesTechniques

Case based reasoning (CBR) or analogical reasoning

Neural nets

Neuro-fuzzy systems

Rule induction

Meta-heuristics e.g. GAs, simulated annealing

Page 45: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 4545: : 23年 4月 20日23年 4月 20日

Case Based Case Based ReasoningReasoning

new case

new case

retrieved case

previous cases

solved case

tested / repaired case

general knowledge

RETRIEVE

REUSEREVISE

RETAIN

suggested solution

confirmed solution

problem

Page 46: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 4646: : 23年 4月 20日23年 4月 20日

Using Using CBRCBR

Characterise a project e.g.

no. of interrupts

size of interface

development method

Find similar completed projects

Use completed projects as a basis for estimate (with adaptation)

Page 47: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 4747: : 23年 4月 20日23年 4月 20日

ProbleProblemsms Finding the analogy, especially

in a large organisation.

Determining how good the analogy is

Need for domain knowledge and expertise for case adaptation.

Need for systematically structured data to represent each case.

Page 48: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 4848: : 23年 4月 20日23年 4月 20日

ANGANGELEL

http://dec.bmth.ac.uk/ESERG/ANGEL/

ANaloGy Estimation tooL (ANGEL)

Page 49: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 4949: : 23年 4月 20日23年 4月 20日

ANGEL ANGEL FeaturesFeatures Shell

n features (continuous or categorical)

Brute force search for optimal subset of features — O((2**n) -1)

Measures Euclidean distance (standardised dimensions)

Uses k nearest cases.

Simple adaptation strategy (weighted mean). With k=1 becomes a NN technique

Page 50: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 5050: : 23年 4月 20日23年 4月 20日

CBR CBR ResultsResults

A study of 275 projects from 9 datasets suggests that CBR outperforms more traditional statistical methods e.g. stepwise regression.

Shepperd, M. Schofield, C. IEEE Trans. on Softw. Eng. 23(11), pp736-743.

Page 51: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 5151: : 23年 4月 20日23年 4月 20日

Sensitivity Sensitivity AnalysisAnalysis

0

20

40

60

80

100

120

140

160

180

2003 5 7 9 11 13

15

17

19

21

23

25

27

29

31

No. of Projects

% M

MR

E T1T2

T3

Page 52: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 5252: : 23年 4月 20日23年 4月 20日

Independent ReplicationIndependent Replication

Niessink and van Vliet (1997)

Stensrud and Myrtviet (1998, 99)

Jeffery and Walkerden (1999)

no search for best subset of features

Briand and El Eman (1998)

approx. 30 features so exhaustive search for best subset not possible

homogeneity + well defined relationships favour regression techniques

Page 53: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 5353: : 23年 4月 20日23年 4月 20日

Artificial Neural Artificial Neural NetsNets

Input layer

Hidden layers Output layer

effort

FP

# files

# screens

team size

A multi-layer feed forward ANN

Page 54: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 5454: : 23年 4月 20日23年 4月 20日

ANN ANN ResultsResults

Study LearningAlgorithm

n Results

Venkatachalam BP 63 “Promising”Wittig & Finnie BP 81

136 MMRE = 29%

Jorgenson BP 109 MMRE = 100%Serluca BP 28 MMRE = 76%Karunanithi etal.

Cascade-Correlation

N/A “More accuratethan algorithmicmodels”

Samson et al BP 63 MMRE = 428%Srinivasan &Fisher

BP 78 MMRE = 70%

Hughes BP 33 MMRE = 55%

BP = back propagation learning algorithm

Page 55: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 5555: : 23年 4月 20日23年 4月 20日

ANN ANN LessonsLessons

need large training sets

deal with heterogeneous datasets

opaque (poor explanatory power)

sensitive to choices of topology and learning algorithm

problems of over adaptation (neuro-fuzzy approaches?)

Page 56: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 5656: : 23年 4月 20日23年 4月 20日

Rule Rule InductionInduction

IF module_size > 100 THEN

high_development_effort

ELSE

IF developer_experience < 2

THEN

low_development_effort

ELSE

moderate_development_effort

C. Mair, G. Kadoda, M. Lefley, K. Phalp, C. Schofield, M. Shepperd, and S. Webster, “An investigation of machine learning based prediction systems,” J. of Systems Software, vol. 53, pp. pp23-29, 2000.

Page 57: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 5757: : 23年 4月 20日23年 4月 20日

Machine Learning Machine Learning SummarySummary

Need training sets

ANNs require significant sized sets n≈50

Configuring the system can be a hard search problem

Don’t need to specify the form of the relationship in advance

Can produce more accurate results than other methods

Adapts as new cases acquired

Page 58: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 5858: : 23年 4月 20日23年 4月 20日

4. 4. Assessing Estimation Assessing Estimation SystemsSystems

accuracy

tolerant of measurement error

explanatory power

ease of use

availability of inputs

...

Page 59: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 5959: : 23年 4月 20日23年 4月 20日

Assessing Model Assessing Model PerformancePerformance

Absolute error

Percentage error and mean percentage error

Magnitude of relative error and mean magnitude of relative error (MMRE)

PRED(n)

Sum of the squares of the residuals (SSR)

...

Page 60: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 6060: : 23年 4月 20日23年 4月 20日

Absolute Absolute ErrorError

But it fails to take into account the size of the project. A 6 PM error is serious if predicted is only 3 PMs, yet, a 6 PM error for a 3,000 PM project is a triumph.

Epred Eact

Page 61: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 6161: : 23年 4月 20日23年 4月 20日

Percentage Percentage ErrorError

or for more than one estimate the mean percentage error:

where n is the number of estimates.

• Reveals any systematic bias to a predictive model, e.g. if the model always over-estimates then the percentage error will be positive.

• A weakness is that it will mask compensating errors

• Reveals any systematic bias to a predictive model, e.g. if the model always over-estimates then the percentage error will be positive.

• A weakness is that it will mask compensating errors

Epred Eact Eact

1n.

Epred Eact Eact

i1

in

i

Page 62: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 6262: : 23年 4月 20日23年 4月 20日

MMRMMREE

MMRE is defined as:

Masks any systematic bias but highlights overall accuracy.

Penalises regression derived models based on least squares algorithms.

1n.

Epred Eact Eact

i1

in

i

Page 63: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 6363: : 23年 4月 20日23年 4月 20日

PRED(nPRED(n))Conte et al. suggest ≤ 25% as an indicator of an acceptable prediction model.

PRED(25) measures the % of predictions that lie within 25% of actual values.

PRED(25) ≥ 75% is a typical target (seldom achieved!)

Page 64: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 6464: : 23年 4月 20日23年 4月 20日

Sum of the Squared Sum of the Squared ResidualsResiduals

If you are risk averse it penalises large deviations more than small ones

SSR = ∑ (Epred-Eact)2

Can also compute mean square error.

Page 65: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 6565: : 23年 4月 20日23年 4月 20日

A Comparison Case A Comparison Case StudyStudy

Statistic LSR Robust MedianR-squared 0.28 0.25 0.26

MMRE 0.78 0.62 0.62

Pred (25) 45% 35% 35%

Balanced MMRE 0.84 0.78 0.77

Page 66: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 6666: : 23年 4月 20日23年 4月 20日

So What’s Going So What’s Going On?On?

central tendency (mean, median)

spread (variance, kurtosis + skewness)

The ith residual is ii yy ˆ

M. J. Shepperd, M. H. Cartwright, and G. F. Kadoda, “On building prediction systems for software engineers,” Empirical Software Engineering, vol. 5, pp175-182, 2000.

Page 67: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 6767: : 23年 4月 20日23年 4月 20日

Estimation Estimation ObjectivesObjectives

Objective Indicator Type

Risk averse sum of squares spread

Error minimising median absolute error spread

Portfolio total error centre

Page 68: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 6868: : 23年 4月 20日23年 4月 20日

5. 5. SummarySummary

Accuracy is a non-trivial concept

No ‘best’ technique

Algorithmic models need to be calibrated

Simple linear models can be surprisingly effective

ANNs need large, not necessarily homogeneous training sets

Evidence to suggest that CBR is often the most accurate and most robust technique

Page 69: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 6969: : 23年 4月 20日23年 4月 20日

Some Estimation Some Estimation GuidelinesGuidelines

Collect data

Use more than one estimating technique.

Minimise the number of cost drivers / coefficients in a model to facilitate calibration:

smaller, more homogeneous data sets

look for simple solutions first

Exploit any local structure or standardisation.

Remember an estimate is a probabilistic statement (bounds?).

Provide feedback for estimators.

Page 70: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 7070: : 23年 4月 20日23年 4月 20日

Future Future AvenuesAvenues

Great need for useful prediction systems

Consider the nature of the prediction problem

Combining prediction systems

Collaboration with experts

Managing with little or no systematic data

Page 71: Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Slide Slide 7171: : 23年 4月 20日23年 4月 20日

Experts plus … Experts plus … ?? Experiment by Myrtveit and Stensrud

using project managers at Andersen Consulting

Asked subjects to make predictions

Found expert+tool significantly better than either expert or tool alone.

?What type of estimation systems are easiest to collaborate with? I. Myrtveit and E.

Stensrud, “A controlled experiment to assess the benefits of estimating with analogy and regression models,” IEEE Trans on Softw. Eng, 25, pp510-525, 1999.