justin w. eggstaff thomas a. mazzuchi shahram …...justin w. eggstaff thomas a. mazzuchi shahram...

32
Justin W. Eggstaff Thomas A. Mazzuchi Shahram Sarkani J. W. Eggstaff, T. A. Mazzuchi, and S. Sarkani. “The development of progress plans using a performance-based expert judgment model to assess technical performance and risk”. Systems Engineering Volume 16 Number 2 in 2014. J. W. Eggstaff, T. A. Mazzuchi, and S. Sarkani. “The effect of the number of seed variables on the performance of Cooke’s Classical Model”, Reliability Engineering and Systems Safety. – 2 nd Revision

Upload: others

Post on 27-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Justin W. Eggstaff Thomas A. Mazzuchi Shahram …...Justin W. Eggstaff Thomas A. Mazzuchi Shahram Sarkani J. W. Eggstaff, T. A. Mazzuchi, and S. Sarkani. “The development of progress

Justin W. Eggstaff Thomas A. Mazzuchi

Shahram Sarkani

J. W. Eggstaff, T. A. Mazzuchi, and S. Sarkani. “The development of progress plans using a performance-based expert judgment model to assess technical performance and risk”. Systems Engineering Volume 16 Number 2 in 2014.

J. W. Eggstaff, T. A. Mazzuchi, and S. Sarkani. “The effect of the number of seed variables on the performance of Cooke’s Classical Model”, Reliability Engineering and Systems Safety. – 2nd Revision

Page 2: Justin W. Eggstaff Thomas A. Mazzuchi Shahram …...Justin W. Eggstaff Thomas A. Mazzuchi Shahram Sarkani J. W. Eggstaff, T. A. Mazzuchi, and S. Sarkani. “The development of progress

 Each year, annual costs of DoD research & development (R&D) are approximately 50% above original estimates

 Typical delays in weapons systems initial operational capability (IOC) are in excess of 20 months

 Weapons Systems Acquisition Reform Act of 2009

2 of 35

Page 3: Justin W. Eggstaff Thomas A. Mazzuchi Shahram …...Justin W. Eggstaff Thomas A. Mazzuchi Shahram Sarkani J. W. Eggstaff, T. A. Mazzuchi, and S. Sarkani. “The development of progress

 Overall program performance depends on three factors: • Cost •  Schedule •  Technical

 Technical performance is typically “assumed”

 Poor cost and schedule performance are symptoms or effects that manifest from poor technical performance

 Current methods for the predication of

3 of 35

Page 4: Justin W. Eggstaff Thomas A. Mazzuchi Shahram …...Justin W. Eggstaff Thomas A. Mazzuchi Shahram Sarkani J. W. Eggstaff, T. A. Mazzuchi, and S. Sarkani. “The development of progress

 Technical Measurement •  Types of technical measures •  Attributes •  Technical reviews and audits

4 of 35

Page 5: Justin W. Eggstaff Thomas A. Mazzuchi Shahram …...Justin W. Eggstaff Thomas A. Mazzuchi Shahram Sarkani J. W. Eggstaff, T. A. Mazzuchi, and S. Sarkani. “The development of progress

 Designed to provide a numerical value of risk by the comparison of current TPM progress against a desired level or performance, or a performance threshold, predefined by the analyst

 Category A – Smaller the Better (Software Errors)

 Category B – Larger the Better (system Range)

5 of 35

Page 6: Justin W. Eggstaff Thomas A. Mazzuchi Shahram …...Justin W. Eggstaff Thomas A. Mazzuchi Shahram Sarkani J. W. Eggstaff, T. A. Mazzuchi, and S. Sarkani. “The development of progress

 Overall Risk

6 of 35

Page 7: Justin W. Eggstaff Thomas A. Mazzuchi Shahram …...Justin W. Eggstaff Thomas A. Mazzuchi Shahram Sarkani J. W. Eggstaff, T. A. Mazzuchi, and S. Sarkani. “The development of progress

7 of 35

Page 8: Justin W. Eggstaff Thomas A. Mazzuchi Shahram …...Justin W. Eggstaff Thomas A. Mazzuchi Shahram Sarkani J. W. Eggstaff, T. A. Mazzuchi, and S. Sarkani. “The development of progress

•  Probably the most widely used method for combining expert judgment in a variety of applications

• Uses a set of seed variables to calculate individual expert Calibration and Information scores which, in turn, are used to calculate an expert’s relative weight

•  The experts’ predicted values for a target variable are combined using their individual weights to calculate the decision maker’s assessment of that variable

Page 9: Justin W. Eggstaff Thomas A. Mazzuchi Shahram …...Justin W. Eggstaff Thomas A. Mazzuchi Shahram Sarkani J. W. Eggstaff, T. A. Mazzuchi, and S. Sarkani. “The development of progress

 Experts assess their uncertainty distribution via specification of a 5%, 50% and 95%-ile values for unknown values and for a set of seed variables (whose actual realization is known to the analyst alone) and a set of variables of interest

 The analyst determines the Intrinsic Range or bounds for the variable distributions

Page 10: Justin W. Eggstaff Thomas A. Mazzuchi Shahram …...Justin W. Eggstaff Thomas A. Mazzuchi Shahram Sarkani J. W. Eggstaff, T. A. Mazzuchi, and S. Sarkani. “The development of progress

•  By specifying the 5%, 50% and 95%-iles, the expert is specifying a 4-bin multinomial distribution with probabilities .05, .45, .45, and .05 for each seed variable response

•  Let si denote the observed bin frequency of seed variables

• We may test how well the expert is calibrated by testing the hypothesis that   H0 si = pi for all i vs Ha si ≠ pi for some i

Page 11: Justin W. Eggstaff Thomas A. Mazzuchi Shahram …...Justin W. Eggstaff Thomas A. Mazzuchi Shahram Sarkani J. W. Eggstaff, T. A. Mazzuchi, and S. Sarkani. “The development of progress

 Test Statistic

  If N (the number of seed variables) is large enough

 Thus the calibration score for the expert is the probability of getting a relative information score worse (greater or equal to) than what was obtained

Page 12: Justin W. Eggstaff Thomas A. Mazzuchi Shahram …...Justin W. Eggstaff Thomas A. Mazzuchi Shahram Sarkani J. W. Eggstaff, T. A. Mazzuchi, and S. Sarkani. “The development of progress

 The relative information for expert e on a variable is

Page 13: Justin W. Eggstaff Thomas A. Mazzuchi Shahram …...Justin W. Eggstaff Thomas A. Mazzuchi Shahram Sarkani J. W. Eggstaff, T. A. Mazzuchi, and S. Sarkani. “The development of progress

•  total weight for the expert is the normalized product of calibration times information score

•  the calibration score is optimized by choosing A minimum α value such that if C(e) > α, C(e) = 0

•  α is selected so that a fictitious expert with a distribution equal to that of the weighted combination of expert distributions would be given the highest weight among experts

•  Final uncertainty distribution = Σ wiFi(x)

Page 14: Justin W. Eggstaff Thomas A. Mazzuchi Shahram …...Justin W. Eggstaff Thomas A. Mazzuchi Shahram Sarkani J. W. Eggstaff, T. A. Mazzuchi, and S. Sarkani. “The development of progress

 Three reasons for an iterative cross-validation analysis •  The Classical Model uses a set of seed

variables to develop expert weights; an iterative approach is needed

•  The question of the minimum number of seed variables required has not been answered

•  The ongoing debate over the robustness of the Classical Model (performance weights versus equal weights)

14 of 35

Page 15: Justin W. Eggstaff Thomas A. Mazzuchi Shahram …...Justin W. Eggstaff Thomas A. Mazzuchi Shahram Sarkani J. W. Eggstaff, T. A. Mazzuchi, and S. Sarkani. “The development of progress

  Cooke and Goossen (2008) •  Examines 45 expert judgment studies compiled over 20

years   Clemen (2008)

•  Asserts “in-sample” analysis is biased toward the classical model; Suggests the use of “out-of-sample/Remove-One-At-a-Time (ROAT)” analysis

•  Selected 14 studies to compare the performance-weighted (PW) decision maker and the equally-weighted (EW) decision maker

15 of 12

Page 16: Justin W. Eggstaff Thomas A. Mazzuchi Shahram …...Justin W. Eggstaff Thomas A. Mazzuchi Shahram Sarkani J. W. Eggstaff, T. A. Mazzuchi, and S. Sarkani. “The development of progress

  Cooke (2008) •  Notes that a ROAT approach tends to favor or punish

excluded experts and presents a “two-fold” cross validation

•  In 20 of 26 validation runs, the PW outperformed the EW   Lin and Cheng (2008); (2009)

•  Using out-of-sample analysis, examines the available 45 studies and finds that the PW outperforms the EW, but with degraded performance

  Flandoli et al (2010) •  Performs a modified “two-fold” cross validation with 500

combinations of 30-70 splits •  Results show the Cooke’s model gives best indication of

uncertainty when averaged 16 of 12

Page 17: Justin W. Eggstaff Thomas A. Mazzuchi Shahram …...Justin W. Eggstaff Thomas A. Mazzuchi Shahram Sarkani J. W. Eggstaff, T. A. Mazzuchi, and S. Sarkani. “The development of progress

 Analysis conducted • Comprehensive “Out-of-Sample” analysis • One-tailed sign test (Clemen, 2008)

 Data used •  55 expert judgment studies compiled over 20

years •  63 data sets: 604 experts, 770 seed

variables, ~68M judgments

17 of 35

Page 18: Justin W. Eggstaff Thomas A. Mazzuchi Shahram …...Justin W. Eggstaff Thomas A. Mazzuchi Shahram Sarkani J. W. Eggstaff, T. A. Mazzuchi, and S. Sarkani. “The development of progress

Iteration Seed Variables Used Target Variables Evaluated

1 1 2 3 4

2 2 1 3 4

3 3 1 2 4

4 4 1 2 3

5 1 2 3 4

6 1 3 2 4

7 1 4 2 3

8 2 3 1 4

9 2 4 1 3

10 3 4 1 2

11 1 2 3 4

12 1 2 4 3

13 1 3 4 2

14 2 3 4 1

18 of 35

Extent of previous cross-validation studies

Page 19: Justin W. Eggstaff Thomas A. Mazzuchi Shahram …...Justin W. Eggstaff Thomas A. Mazzuchi Shahram Sarkani J. W. Eggstaff, T. A. Mazzuchi, and S. Sarkani. “The development of progress

Mean Out-of-Sample Combination Scores (Calibration × Information)

19 of 35

Study ID No. of Experts

No. of Variables

DM Type

No. of Variables Used to Determine Performance Measure

1 2 3 4 5 6 7 8

MVOSEEDS 77 5 PWDM EWDM

0.3259 0.0279

0.5579 0.1154

0.6773 0.3071

0.8414 0.6963

A_SEED 7 6 PWDM EWDM

0.1434 0.0072

0.3312 0.0229

0.3462 0.0580

0.3332 0.1260

0.4439 0.2508

AOTDAILY 7 6 PWDM EWDM

0.0167 0.0164

0.0294 0.0313

0.0583 0.0586

0.1199 0.1036

0.2271 0.1565

FCEP 5 8 PWDM EWDM

0.0028 0.0001

0.5309 0.0008

0.7328 0.0038

0.8917 0.0135

1.0556 0.0399

1.0792 0.1059

1.1396 0.2434

BSWAAL 6 8 PWDM EWDM

0.3811 0.2697

0.2538 0.3142

0.3624 0.3458

0.3958 0.3688

0.3932 0.3862

0.3665 0.3900

0.4860 0.4406

DSM-1 10 8 PWDM EWDM

0.1546 0.2637

0.2075 0.2939

0.2448 0.3105

0.3224 0.3241

0.4849 0.3403

0.6048 0.3576

0.6591 0.4508

MONT1 11 8 PWDM EWDM

0.6249 0.2312

0.6168 0.2880

0.5673 0.3497

0.5964 0.4158

0.6656 0.4854

0.6350 0.5734

0.6423 0.7321

SO3EXPTS 4 9 PWDM EWDM

0.0123 2.9E-5

0.1847 0.0002

0.3236 0.0013

0.5801 0.0063

0.7460 0.0254

0.9834 0.0856

1.0993 0.2407

2.1950 0.5700

WATERPOL 11 9 PWDM EWDM

0.0115 0.0033

0.1661 0.0111

0.4032 0.0313

0.5544 0.0687

0.6987 0.1195

0.8737 0.1798

0.9985 0.2624

1.0289 0.4852

Single Decision Maker Dominates in 28 of 63 Cases PWDM: 21 Cases EWDM: 7 Cases

Single Modal Switching in 22 of 63 Cases EWDM gives way to PWDM: 10 Cases PWDM gives way to EWDM: 12 Cases

Dual Modal Switching (Parabolic) in 11 of 63 Cases PWDM at the extremes: 7 Cases EWDM at the extremes: 4 Cases Somewhat Random Switching in 2 of 63 Cases BSWAAL ACNEXPTS

Page 20: Justin W. Eggstaff Thomas A. Mazzuchi Shahram …...Justin W. Eggstaff Thomas A. Mazzuchi Shahram Sarkani J. W. Eggstaff, T. A. Mazzuchi, and S. Sarkani. “The development of progress

Mean Out-of-Sample Combination Scores (Calibration × Information)

20 of 35

Study ID No. of Experts

No. of Variables

DM Type

No. of Variables Used to Determine Performance Measure

1 2 3 4 5 6 7 8

MVOSEEDS 77 5 PWDM EWDM

0.3259 0.0279

0.5579 0.1154

0.6773 0.3071

0.8414 0.6963

A_SEED 7 6 PWDM EWDM

0.1434 0.0072

0.3312 0.0229

0.3462 0.0580

0.3332 0.1260

0.4439 0.2508

AOTDAILY 7 6 PWDM EWDM

0.0167 0.0164

0.0294 0.0313

0.0583 0.0586

0.1199 0.1036

0.2271 0.1565

FCEP 5 8 PWDM EWDM

0.0028 0.0001

0.5309 0.0008

0.7328 0.0038

0.8917 0.0135

1.0556 0.0399

1.0792 0.1059

1.1396 0.2434

BSWAAL 6 8 PWDM EWDM

0.3811 0.2697

0.2538 0.3142

0.3624 0.3458

0.3958 0.3688

0.3932 0.3862

0.3665 0.3900

0.4860 0.4406

DSM-1 10 8 PWDM EWDM

0.1546 0.2637

0.2075 0.2939

0.2448 0.3105

0.3224 0.3241

0.4849 0.3403

0.6048 0.3576

0.6591 0.4508

MONT1 11 8 PWDM EWDM

0.6249 0.2312

0.6168 0.2880

0.5673 0.3497

0.5964 0.4158

0.6656 0.4854

0.6350 0.5734

0.6423 0.7321

SO3EXPTS 4 9 PWDM EWDM

0.0123 2.9E-5

0.1847 0.0002

0.3236 0.0013

0.5801 0.0063

0.7460 0.0254

0.9834 0.0856

1.0993 0.2407

2.1950 0.5700

WATERPOL 11 9 PWDM EWDM

0.0115 0.0033

0.1661 0.0111

0.4032 0.0313

0.5544 0.0687

0.6987 0.1195

0.8737 0.1798

0.9985 0.2624

1.0289 0.4852

Page 21: Justin W. Eggstaff Thomas A. Mazzuchi Shahram …...Justin W. Eggstaff Thomas A. Mazzuchi Shahram Sarkani J. W. Eggstaff, T. A. Mazzuchi, and S. Sarkani. “The development of progress

Accuracy Measures (One-tailed Sign Test)

21 of 35

Median p-value by data set PWDM is more accurate than EWDM (p ≥ 0.5): 42 of 63 cases EWDM is more accurate than PWDM (p < 0.5): 11 of 63 cases Overall median p-value: 0.74

Median p-value by number of seed variables PWDM is more accurate in ALL cases PWDM is significantly more accurate in all but two cases

Page 22: Justin W. Eggstaff Thomas A. Mazzuchi Shahram …...Justin W. Eggstaff Thomas A. Mazzuchi Shahram Sarkani J. W. Eggstaff, T. A. Mazzuchi, and S. Sarkani. “The development of progress

  Data Used •  The data set for this research comes from the

unpublished white paper by Coleman, Kulick, and Pisano (1996) on the T45TS Cockpit-21 project

•  Actual data used simulated for use in the expert judgment model

22 of 35 Appendix B

Page 23: Justin W. Eggstaff Thomas A. Mazzuchi Shahram …...Justin W. Eggstaff Thomas A. Mazzuchi Shahram Sarkani J. W. Eggstaff, T. A. Mazzuchi, and S. Sarkani. “The development of progress

23 of 35 Figure 4-3, p. 112 E-TRI Flow Diagram

Page 24: Justin W. Eggstaff Thomas A. Mazzuchi Shahram …...Justin W. Eggstaff Thomas A. Mazzuchi Shahram Sarkani J. W. Eggstaff, T. A. Mazzuchi, and S. Sarkani. “The development of progress

  Oct-93 Milestone Review: Nov-93 TPM value is predicted

24 of 35

  Nov-93 Milestone Review: TPM value is realized

Page 25: Justin W. Eggstaff Thomas A. Mazzuchi Shahram …...Justin W. Eggstaff Thomas A. Mazzuchi Shahram Sarkani J. W. Eggstaff, T. A. Mazzuchi, and S. Sarkani. “The development of progress

  Expert weights are determined

25 of 35

Page 26: Justin W. Eggstaff Thomas A. Mazzuchi Shahram …...Justin W. Eggstaff Thomas A. Mazzuchi Shahram Sarkani J. W. Eggstaff, T. A. Mazzuchi, and S. Sarkani. “The development of progress

  Decision maker’s assessment is calculated •  Using weighted expert predictions •  Calculated for all remaining milestones

26 of 35

Page 27: Justin W. Eggstaff Thomas A. Mazzuchi Shahram …...Justin W. Eggstaff Thomas A. Mazzuchi Shahram Sarkani J. W. Eggstaff, T. A. Mazzuchi, and S. Sarkani. “The development of progress

  Nov-93 updated prediction is presented

27 of 35

Page 28: Justin W. Eggstaff Thomas A. Mazzuchi Shahram …...Justin W. Eggstaff Thomas A. Mazzuchi Shahram Sarkani J. W. Eggstaff, T. A. Mazzuchi, and S. Sarkani. “The development of progress

  E-TRI for final state (Feb-94) is calculated

28 of 35

Page 29: Justin W. Eggstaff Thomas A. Mazzuchi Shahram …...Justin W. Eggstaff Thomas A. Mazzuchi Shahram Sarkani J. W. Eggstaff, T. A. Mazzuchi, and S. Sarkani. “The development of progress

  Case Study Data

29 of 35

Page 30: Justin W. Eggstaff Thomas A. Mazzuchi Shahram …...Justin W. Eggstaff Thomas A. Mazzuchi Shahram Sarkani J. W. Eggstaff, T. A. Mazzuchi, and S. Sarkani. “The development of progress

  System E-TRI for final state (Feb-94) is calculated

30 of 35

Page 31: Justin W. Eggstaff Thomas A. Mazzuchi Shahram …...Justin W. Eggstaff Thomas A. Mazzuchi Shahram Sarkani J. W. Eggstaff, T. A. Mazzuchi, and S. Sarkani. “The development of progress

QUESTIONS?

Page 32: Justin W. Eggstaff Thomas A. Mazzuchi Shahram …...Justin W. Eggstaff Thomas A. Mazzuchi Shahram Sarkani J. W. Eggstaff, T. A. Mazzuchi, and S. Sarkani. “The development of progress

  Garvey, P. R., & Cho, C.-C. (2003). An Index to Measure a System's Performance Risk. Acquisition Review Quarterly, Spring, 189-199.

  Winkler, R. L. (1968). The consensus of subjective probability distributions. Management Science, 15(2), 61-75.

  Cooke, R. M. (1991). Experts in uncertainty: opinion and subjective probability in science. New York: Oxford University Press.

  Clemen, R. T. (2008). Comment on Cooke's classical method. Reliability Engineering & System Safety, 93(5), 760-765.

  Coleman, C., Kulick, K., & Pisano, N. (1996). Technical performance measurement (TPM) retrospective implementation and concept validation on the T45TS Cockpit-21 program. Program Executive Office for Air Anti-Submarine Warfare, Assault, and Special Mission Programs, White Paper.

32 of 35