Download - Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Software Cost Software Cost EstimationEstimation

Strictly speaking effort!

강릉대학교 컴퓨터공학과권 기 태

Slide Slide 22: : 23年 4月 20日23年 4月 20日

AgendAgendaa

1. Background

2. “Current” techniques

3. Machine learning techniques

4. Assessing prediction systems

5. Future avenues


1. 1. BackgroundBackground

Scope:

software projects

early estimates

effort ≠ cost

estimate ≠ expected answer


What the Papers What the Papers Say...Say...

From Computing, 26 November 1998:

Defence system never worked

MoD project loses £34m The Ministry of Defence has been forced to write off £34.6 million on an IT project it commissioned in 1988 and abandoned eight years later, writes Joanne Wallen. The Trawlerman system, designed ...


The The ProblemProblemSoftware developers need to predict, e.g.

effort, duration, number of features

defects and reliability

But ...

little systematic data

noise and change

complex interactions between variables

poorly understood phenomena


So What is an So What is an Estimate?Estimate?

An estimate is a prediction based upon probabilistic assessment.

p

effort0

most likely

equal probability of under / over estimate


Some Causes of Poor Some Causes of Poor EstimationEstimation

We don’t cope with political

problems that hamper the

process.

We don’t develop estimating

expertise.

We don’t systematically use

past experience.Tom DeMarcoControlling Software Projects. Management, Measurement and Estimation. Yourdon Press: NY, 1982.


2. “2. “Current” Current” TechniquesTechniques

Essentially a software cost estimation system is an input vector mapped to an output.

expert judgement

COCOMO

function points

DIY models

Barry Boehm“Software Engineering Economics,” IEEE Transactions on Software Engineering, vol. 10, pp. 4-21, 1984.


2.1 2.1 Expert Expert JudgementJudgement

Most widely used estimation technique

No consistently “best” prediction system

Lack of historical data

Need to “own” the estimate

Experts plus … ?


Expert Judgement Expert Judgement DrawbacksDrawbacks

BUT Lack of objectivity

Lack of repeatability

Lack of recall /awareness

Lack of experts!

Preferable to use more than one expert.

Preferable to use more than one expert.


What Do We Know About What Do We Know About Experts?Experts?

Most commonly practised technique.

Dutch survey revealed 62% of estimators used intuition supplemented by remembered analogies.

UK survey - time to estimate ranged from 5 minutes to 4 weeks.

US survey found that the only factor with a significant positive relationship with accuracy was responsibility.


Information Information UsedUsed

Design requirements

Resources available

Base product/source code (enhancement projects)

Software tools available

Previous history of product

...


Information Information NeededNeeded

Rules of thumb

Available resources

Data on past projects

Feedback on past estimates

...


Delphi Delphi Techniques?Techniques?

Methods forstructuring group communication processes

tosolve complex problems.

Characterised byiterationanonymity

Devised by Rand Corporation (1948). Refined by Boehm (1981).


Stages for Delphi Stages for Delphi ApproachApproach

1. Experts receive spec + estimation form

2. Discussion of product + estimation issues

3. Experts produce individual estimate

4. Estimates tabulated and returned to experts

5. Only expert's personal estimate identified

6. Experts meet to discuss results

7. Estimates are revised

8. Cycle continues until an acceptable degree of convergence is obtained


Wideband Delphi Wideband Delphi FormForm

Project: X134 Date: 9/17/03

Estimator: Hyolee

Estimation round: 1

0 10 20 30 40 50

x x* x x! x x x

Key: x = estimate; x* = your estimate; x! = median estimate


Observing Delphi Observing Delphi GroupsGroups

Four groups of MSc student

Developing a C++ prototype for some simple scenarios

Requested to estimate size of prototype (number of delimiters)

Initial estimates followed by 2 group discussions

Recorded group discussions plus scribes


Delphi Size Estimation Delphi Size Estimation ResultsResults

Estimation Mean Median Min Max

Initial 371 160.5 23 2249Round 1 219 40 23 749Round 2 271 40 3 949

Absolute errors


Converging Converging GroupGroup

0

50

100

150

200

250

300

350

400

450

Initial Size Round1 Size Round2 Size

Series1Series2Series3

true size


A Dominant A Dominant IndividualIndividual

0

500

1000

1500

2000

2500

3000

Initial Size Round1 Size Round2 Size

Series1Series2Series3

true size


2.2 2.2 COCOMOCOCOMO

Best known example of an algorithmic cost model. Series of three models: basic, intermediate and detailed.

Models assume relationships between: size (KDSI) and effort effort and elapsed time

MMa.KDSIb

TDEVc.MMd

http://sunset.usc.edu/COCOMOII/cocomo.html

Barry Boehm“Software Engineering Economics,” IEEE Transactions on Software Engineering, vol. 10, pp. 4-21, 1984.


COCOMO COCOMO contd.contd.

Model coefficients are dependant upon the type of project:

organic: small teams, familiar application

semi-detached embedded: complex organisation,

software and/or hardware interactions


COCOMO Cost COCOMO Cost DriversDrivers

• product attributes• computer attributes• personnel attributes• project attributes

Drivers hard to empirically validate.Many are inappropriate for 1990's e.g. database size.Drivers not independent e.g. MODP and TOOL.

Drivers hard to empirically validate.Many are inappropriate for 1990's e.g. database size.Drivers not independent e.g. MODP and TOOL.


COCOMO COCOMO AssessmentAssessment

Very influential, non-proprietory model.

Drivers help the manager understand the impact of different factors upon project costs.

Hard to port to different development environments without extensive re-calibration.

Vulnerable to mis-classification of development type

Hard to estimate KDSI at the start of a project.


2.3 2.3 What are Function What are Function Points?Points?

A synthetic (indirect) measure derived from

a software requirements specification of

the attribute functionality.

This conforms closely to our notion of

specification size.

Uses:

effort prediction

productivity


Function Points (a brief Function Points (a brief history)history)

Albrecht developed FPs in mid 1970's at IBM.

Measure of system functionality as opposed to size.

Weighted count of function types derived from specification:

interfaces

inquiries

inputs / outputs

files

A. Albrecht and J. Gaffney, “Software function, source lines of code, and development effort prediction: a software science validation,” IEEE Transactions on Software Engineering, vol. 9, pp. 639-648, 1983.C. Symons, “Function Point Analysis: Difficulties and Improvements,” IEEE Transactions on Software Engineering, vol. 14, pp. 2-11, 1988.

A. Albrecht and J. Gaffney, “Software function, source lines of code, and development effort prediction: a software science validation,” IEEE Transactions on Software Engineering, vol. 9, pp. 639-648, 1983.C. Symons, “Function Point Analysis: Difficulties and Improvements,” IEEE Transactions on Software Engineering, vol. 14, pp. 2-11, 1988.


Function Point Function Point RulesRules

Weighted count of different types of functions:external input types (4) e.g. file names

external output types (5) e.g. reports, msgs.

inquiries (4) i.e. interactive inputs needing a response

external files (7) i.e. files shared with other software systems

internal files (10) i.e. invisible outside system

The unadjusted count (UFC) is the weighted sum

of the count of each type of function.


Function Function TypesTypes

Type Simple Average Complex

External input 3 4 6

External output 4 5 7

Logical int. file 7 10 15

Ext. interface 5 7 10

Ext. inquiry 3 4 6


Adjusted Adjusted FPsFPs

14 factors contribute to the technical complexity factor (TCF), e.g. performance, on-line update, complex interface.

Each factor is rated 0 (n.a.) - 5 (essential).

TCF = 0.65 + (sum of factors)/100

Thus TCF may range from 0.65 to 1.35, and

FP = UFC*TCF


Technical Complexity Technical Complexity FactorsFactors

Data communicationsDistributed functionsPerformanceHeavily used configurationTransaction rateOnline data entryEnd user efficiency

Online updateComplex processingReusabilityInstallation easeOperational easeMultiple sitesFacilities change


Function Points and Function Points and LOCLOC

Language LOC per FPAssembler 320C 150 (128)COBOL 106 (105)Modula-2 71 (80)4GL 40 (20)Query languages 16 (13)Spreadsheet 6

Behrens (1983), IEEE TSE 9(6).C. Jones “Applied Software Measurement, McGraw-Hill (1991)


FP Based FP Based PredictionsPredictions

Simplest form is:

effort = FC + p * FP

Need to determine local productivity, p and fixed costs, FC.

10000

20000

30000

40000

500 1000 1500 2000

FP

EFFORT

Effort v FPs at XYZ Bank


All environments are not All environments are not equal equal

Productivity figures in FPs per 1000

hours:

IBM 29.6

Finnish 99.5

Canada 58.9

Mermaid 37.0

US 28.5

trainingpersonnelmanagementtechniquestoolsapplicationsetc.


Function Point Function Point UsersUsers

Widely used, (e.g. government, financial organisations) with some success: monitor team productivity cost estimation

Most effective where homogeneous environment

Variants include Mk II Points and Feature Points


Function Point Function Point WeaknessesWeaknesses Subjective counting (Low and Jeffery report

30% variation between different analysts). Hard to automate. Hard to apply to maintenance work. Not based upon organisational needs, e.g. is

it productive to produce functions irrelevant to the user?

Oriented to traditional DP type applications. Hard to calibrate.

Frequently leads to inaccurate prediction systems.


Function Point Function Point StrengthsStrengths

The necessary data can be available early on in a project.

Language independent.

Layout independent (unlike LOC)

More accurate than estimated LOC?

What is the alternative?


2.4 2.4 DIY DIY modelsmodels

250

500

750

1000

75 150 225

FILES

ACT

Predicting effort using number of files


To introduce an economies or diseconomies of scale exponent:

effort = p * Se

where 0<e.

An empirical study of 60 projects at IBM Federal Systems Division during the mid 1970s concluded that effort could be modelled as:

effort (PM) = 5.2 * KLOC0.91

A Non-linear A Non-linear ModelModel


Productivity and Productivity and SizeSize

Effort (PM) Size (KLOC) KLOC/PM

42.27 10 0.24

79.42 20 0.25

182.84 50 0.27

343.56 100 0.29

2792.57 1000 0.36

Productivity and Project Size using the Walston and Felix Model


Productivity v Productivity v SizeSize

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0 200 400 600 800 1000 1200

KLOC


Bespoke is Bespoke is Better!Better!

Model Researcher MMRE

Basic COCOMO Kemerer 601%

FP Kemerer 103%

SLIM Kemerer 772%

ESTIMACS Kemerer 85%

COCOMO Miyazaki & Mori 166%

Intermediate COCOMO Kitchenham 255%


So Where Are So Where Are We?We?

• A major research topic.

• Poor results “off the shelf”.

• Accuracy improves with calibration but still mixed.

• Needs accurate, (largely) quantitative inputs.


3. 3. Machine Learning Machine Learning TechniquesTechniques

A new area but demonstrating promise.

System “learns” how to estimate from a training set.

Doesn’t assume a continuous functional relationship.

In theory more robust against outliers, more flexible types of relationship.

Du Zhang and Jeffrey Tsai, “Machine Learning and Software Engineering,” Software Quality Journal, vol. 11, pp. 87-119, 2003.

Du Zhang and Jeffrey Tsai, “Machine Learning and Software Engineering,” Software Quality Journal, vol. 11, pp. 87-119, 2003.


Different ML Different ML TechniquesTechniques

Case based reasoning (CBR) or analogical reasoning

Neural nets

Neuro-fuzzy systems

Rule induction

Meta-heuristics e.g. GAs, simulated annealing


Case Based Case Based ReasoningReasoning

new case

new case

retrieved case

previous cases

solved case

tested / repaired case

general knowledge

RETRIEVE

REUSEREVISE

RETAIN

suggested solution

confirmed solution

problem


Using Using CBRCBR

Characterise a project e.g.

no. of interrupts

size of interface

development method

Find similar completed projects

Use completed projects as a basis for estimate (with adaptation)


ProbleProblemsms Finding the analogy, especially

in a large organisation.

Determining how good the analogy is

Need for domain knowledge and expertise for case adaptation.

Need for systematically structured data to represent each case.


ANGANGELEL

http://dec.bmth.ac.uk/ESERG/ANGEL/

ANaloGy Estimation tooL (ANGEL)


ANGEL ANGEL FeaturesFeatures Shell

n features (continuous or categorical)

Brute force search for optimal subset of features — O((2**n) -1)

Measures Euclidean distance (standardised dimensions)

Uses k nearest cases.

Simple adaptation strategy (weighted mean). With k=1 becomes a NN technique


CBR CBR ResultsResults

A study of 275 projects from 9 datasets suggests that CBR outperforms more traditional statistical methods e.g. stepwise regression.

Shepperd, M. Schofield, C. IEEE Trans. on Softw. Eng. 23(11), pp736-743.


Sensitivity Sensitivity AnalysisAnalysis

0

20

40

60

80

100

120

140

160

180

2003 5 7 9 11 13

15

17

19

21

23

25

27

29

31

No. of Projects

% M

MR

E T1T2

T3


Independent ReplicationIndependent Replication

Niessink and van Vliet (1997)

Stensrud and Myrtviet (1998, 99)

Jeffery and Walkerden (1999)

no search for best subset of features

Briand and El Eman (1998)

approx. 30 features so exhaustive search for best subset not possible

homogeneity + well defined relationships favour regression techniques


Artificial Neural Artificial Neural NetsNets

Input layer

Hidden layers Output layer

effort

FP

# files

# screens

team size

A multi-layer feed forward ANN


ANN ANN ResultsResults

Study LearningAlgorithm

n Results

Venkatachalam BP 63 “Promising”Wittig & Finnie BP 81

136 MMRE = 29%

Jorgenson BP 109 MMRE = 100%Serluca BP 28 MMRE = 76%Karunanithi etal.

Cascade-Correlation

N/A “More accuratethan algorithmicmodels”

Samson et al BP 63 MMRE = 428%Srinivasan &Fisher

BP 78 MMRE = 70%

Hughes BP 33 MMRE = 55%

BP = back propagation learning algorithm


ANN ANN LessonsLessons

need large training sets

deal with heterogeneous datasets

opaque (poor explanatory power)

sensitive to choices of topology and learning algorithm

problems of over adaptation (neuro-fuzzy approaches?)


Rule Rule InductionInduction

IF module_size > 100 THEN

high_development_effort

ELSE

IF developer_experience < 2

THEN

low_development_effort

ELSE

moderate_development_effort

C. Mair, G. Kadoda, M. Lefley, K. Phalp, C. Schofield, M. Shepperd, and S. Webster, “An investigation of machine learning based prediction systems,” J. of Systems Software, vol. 53, pp. pp23-29, 2000.


Machine Learning Machine Learning SummarySummary

Need training sets

ANNs require significant sized sets n≈50

Configuring the system can be a hard search problem

Don’t need to specify the form of the relationship in advance

Can produce more accurate results than other methods

Adapts as new cases acquired


4. 4. Assessing Estimation Assessing Estimation SystemsSystems

accuracy

tolerant of measurement error

explanatory power

ease of use

availability of inputs

...


Assessing Model Assessing Model PerformancePerformance

Absolute error

Percentage error and mean percentage error

Magnitude of relative error and mean magnitude of relative error (MMRE)

PRED(n)

Sum of the squares of the residuals (SSR)

...


Absolute Absolute ErrorError

But it fails to take into account the size of the project. A 6 PM error is serious if predicted is only 3 PMs, yet, a 6 PM error for a 3,000 PM project is a triumph.

Epred Eact


Percentage Percentage ErrorError

or for more than one estimate the mean percentage error:

where n is the number of estimates.

• Reveals any systematic bias to a predictive model, e.g. if the model always over-estimates then the percentage error will be positive.

• A weakness is that it will mask compensating errors

• Reveals any systematic bias to a predictive model, e.g. if the model always over-estimates then the percentage error will be positive.

• A weakness is that it will mask compensating errors

Epred Eact Eact

1n.

Epred Eact Eact

i1

in

i


MMRMMREE

MMRE is defined as:

Masks any systematic bias but highlights overall accuracy.

Penalises regression derived models based on least squares algorithms.

1n.

Epred Eact Eact

i1

in

i


PRED(nPRED(n))Conte et al. suggest ≤ 25% as an indicator of an acceptable prediction model.

PRED(25) measures the % of predictions that lie within 25% of actual values.

PRED(25) ≥ 75% is a typical target (seldom achieved!)


Sum of the Squared Sum of the Squared ResidualsResiduals

If you are risk averse it penalises large deviations more than small ones

SSR = ∑ (Epred-Eact)2

Can also compute mean square error.


A Comparison Case A Comparison Case StudyStudy

Statistic LSR Robust MedianR-squared 0.28 0.25 0.26

MMRE 0.78 0.62 0.62

Pred (25) 45% 35% 35%

Balanced MMRE 0.84 0.78 0.77


So What’s Going So What’s Going On?On?

central tendency (mean, median)

spread (variance, kurtosis + skewness)

The ith residual is ii yy ˆ

M. J. Shepperd, M. H. Cartwright, and G. F. Kadoda, “On building prediction systems for software engineers,” Empirical Software Engineering, vol. 5, pp175-182, 2000.


Estimation Estimation ObjectivesObjectives

Objective Indicator Type

Risk averse sum of squares spread

Error minimising median absolute error spread

Portfolio total error centre


5. 5. SummarySummary

Accuracy is a non-trivial concept

No ‘best’ technique

Algorithmic models need to be calibrated

Simple linear models can be surprisingly effective

ANNs need large, not necessarily homogeneous training sets

Evidence to suggest that CBR is often the most accurate and most robust technique


Some Estimation Some Estimation GuidelinesGuidelines

Collect data

Use more than one estimating technique.

Minimise the number of cost drivers / coefficients in a model to facilitate calibration:

smaller, more homogeneous data sets

look for simple solutions first

Exploit any local structure or standardisation.

Remember an estimate is a probabilistic statement (bounds?).

Provide feedback for estimators.


Future Future AvenuesAvenues

Great need for useful prediction systems

Consider the nature of the prediction problem

Combining prediction systems

Collaboration with experts

Managing with little or no systematic data


Experts plus … Experts plus … ?? Experiment by Myrtveit and Stensrud

using project managers at Andersen Consulting

Asked subjects to make predictions

Found expert+tool significantly better than either expert or tool alone.

?What type of estimation systems are easiest to collaborate with? I. Myrtveit and E.

Stensrud, “A controlled experiment to assess the benefits of estimating with analogy and regression models,” IEEE Trans on Softw. Eng, 25, pp510-525, 1999.

Download - Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

Top Related