output analyses for single system. 2 why? often most of emphasis in a simulation experiment is on...

Output analyses for single system

2

Why?

• Often most of emphasis in a simulation experiment is on model development and programming.

• Very little resources (time and money) is budgeted for analyzing the output of the simulation experiment.

• In fact, it is not uncommon to see a single run of the simulation experiment being carried out and getting the “results” from the simulation model.

• The single run also is of arbitrary length and the output of this is considered “true.”

• Since simulation modeling is done using random parameters of different probability distributions, this single output is just one realization of these random variables.

3

Why?

• If the random parameters of the experiment may have a large variance, one realization of the run may differ greatly from the other.

• This is a real danger of making erroneous inferences about the system we are trying to simulate.

• From the input analyses, we have seen that a single data point has practically no statistical significance.

• Since, we demand a large data set to correctly characterize the input parameters; it should be the same while analyzing the output of the simulation model.

4

Reasons of neglect

• Often simulation is considered a complex exercise in computer programming.

• Hence many times, simulation approach begins with lot of time spent of heuristic model building and coding.

• Towards the end the model is run once to get the “answers” about the model.

• However, a simulation experiment is a computer-based statistical sampling experiment.

• This is not realized very easily.• Hence, if the results of the simulation are to have any

significance and the inferences to have any confidence, appropriate statistical techniques must be used.

5

Reasons of neglect

• These statistical techniques are required not only for the input parameters but also for the output data: not only for the design of the experiment but also for the analyses of the experiment.

• Second reason for output analyses being neglected is more technical.

• Most of the times output data of the simulation experiment is non-stationary and auto-correlated. Hence classical statistical techniques which require data to be IID can’t be directly applied.

• In fact, for many applications, there is no output-analysis solution that is completely acceptable.

6

Reasons of neglect

• More often the methods required to analyze the output are very complicated.

• At times, the output analyses consumes precious computer time.

• And if the output analyses is competing with the model running itself for the computer time, the winner is always the model run and not output data analyses.

• However, this last point is fast becoming obsolete with cheaper faster processors that are available nowadays.

7

Typical output process

• Let Y1, Y2, … Ym be the output stochastic process from a single simulation run.

• Let the realizations of these random variables over n replications be:

• It is very common to observe that within the same run the output process is correlated. However, independence across the replications can be achieved.

• The output analyses depends on this independence.

nmnn

m

m

yyy

yyy

yyy

21

22221

11211

8

Transient and steady-state behavior

• Consider the stochastic processes Yi as before. • In many experiment, the distribution of the output process

depends on the initial conditions to certain extent.• This conditional distribution of the output stochastic process

given the initial condition is called the transient distribution.• We note that this distribution will, in general, be different for

each i and for each set of initial conditions.• The corresponding probabilities from these distributions are

just a sequence of numbers for the given initial condition. • If this sequence converges, as for any initial condition,

then we call the convergence distribution as steady-state distribution.

i

9

Types of simulation

• Terminating simulation

• Non-terminating simulationo Steady-state parameterso Steady-state cycle parameterso Others parameters

10

Terminating simulation

• When there is a “natural” event E that specifies the length of each run (replication).

• If we use different set of independent random variables at input, and same input conditions then the comparable output parameters are IID.

• Event E occurs when at a time beyond which no useful information can be obtained from model or when the system is “cleaned out.”

• It is specified before the start of the experiment and at time it could be random variable itself. Example?

11

Terminating simulation

• Often the initial conditions of the terminating simulation affect the output parameters to a great extent.

Examples of terminating simulation: 1. Banking queue example – when specified that bank operates

between 9 am to 5 pm.2. Inventory planning example (calculating cost over a finite

time horizon).

• Often the conditions specified in the problem could be deceptive leading us to model it as terminating simulation when it is not.

e.g. Manufacturing example where the WIP is carried over shifts.

12

Non-terminating simulation

• There is no natural event E to specify the end of the run.

• Measure of performance for such simulations is said to be steady-state parameter if it is a characteristic of the steady-state distribution of some output process.

• Stochastic processes of most of the real systems do not have steady-state distributions, since the characteristics of the system change over time.

• On the other hand, a simulation model may have steady-state distribution, since often we assume that characteristics of the model don’t change with time.

13


• Consider a stochastic process Y1, Y2, … for a non-terminating simulation that does not have a steady-state distribution.

• Now lets divide the time-axis into equal-length, contiguous time intervals called cycles. Let Yi

C be the random variable defined over the ith cycle.

• Suppose this new stochastic process has a steady-state distribution.

• A measure of performance is called a steady-state performance it is characteristic of YC.

14


• For a non-terminating simulation, suppose that a stochastic process does not have a steady-state distribution.

• Also suppose that there is no appropriate cycle definition such that the corresponding process has a steady-state distribution.

• This can occur if the parameters for the model continue to change over time.

• In these cases, however, there will typically be a fixed amount of data describing how input parameters change over time.

• This provides, in effect, a terminating event E for the simulation, and, thus, the analysis techniques for terminating simulation are appropriate.

15

Statistical analyses of terminating simulation

• Suppose that we have n replications of terminating simulation, where each replication is terminated by the same event E and is begun by the same “initial” conditions.

• Assume that there is only one measure of performance.• Let Xj be the value of performance measure in jth replication j

= 1, 2, …n. So these are IID variables.• For a bank, Xj might be the average delay ( ) over a

day from the jth replication where N is the number of customers served in a day. We can also see that N itself could be a random variable for a replication.

N

DN

ii

1

16


• For a simulation of war game Xj might be the number of tanks destroyed on the jth replication.

• Finally for a inventory system Xj could be the average cost ( ) from the jth replication.

• Suppose that we would like to obtain a point estimate and confidence interval for the mean E[X], where X is the random variable defined on a replication as described above.

• Then make n independent replications of simulation and let Xj be the resulting IID variable in jth replication j = 1, 2, …n.

120

120

1i

iC

17


• We know that an approximate 100(1- α) confidence interval for µ = E[X] is given by:

where we use a fixed sample of n replications and take the sample variance from this (S2(n)).

• Hence this procedure is called a fixed-sample-size procedure.

.)(2

2/1,1 n

nStX nn

18


• One disadvantage of fixed-sample-size procedure based on n replications is that the analyst has no control over the confidence interval half-length (the precision of ( )).

• If the estimate is such that then we say that has an absolute error of β.

• Suppose that we have constructed a confidence interval for µ based on fixed number of replications n.

• We assume that our estimate of S2(n) of the population variance will not change appreciably as the number of replications increase.

nX

nXnX

nX

19


• Then, an expression for the approximate total number of replications required to obtain an absolute error of β is given by:

• If this value na*(β) > n, then we take additional replications

(na*(β) – n) of the simulation, then the estimate mean E[X]

based on all the replications should have an absolute error of approximately β.

.)(

:min2

2/1,1*

i

nStnin ia

20


Sequential procedure for estimating the confidence interval for .

• Let

1. Make n0 replications of the simulation and set n = n0.2. Compute and δ(n, α) from the current sample. 3. If δ(n, α) < β then use this as a point estimate of and

stop. 4. Otherwise replace n with n + 1, make an additional

replication of the simulation and go to Step 1.

.)(

),(2

2/1,1 n

nStn n

nX

nX

21

Choosing initial conditions

• The measures of performances for a terminating simulation depend explicitly on the state of system at time 0.

• Hence it is extremely important to choose initial condition with utmost care.

• Suppose that we want to analyze the average delay for customers who arrive and complete their delays between 12 noon and 1 pm (the busiest for any bank).

• Since the bank would probably be very congested by noon, starting the simulation then with no customers present (usual initial condition for any queuing problem) is not be useful.

• We discuss two heuristic methods for this problem.

22


First approach• Let us assume that the bank opens at 9 am with no customers

present. • Then we start the simulation at 9 am with no customers

present and run it for 4 simulated hours.• In estimating the desired expected average delay, we use only

those customers who arrive and complete their delays between noon and 1 pm.

• The evolution of the simulation between 9 am to noon (the “warm-up period”) determines the appropriate conditions for the simulation at noon.

23


First approach• The main disadvantage with this approach is that 3 hours of

simulated time are not used directly in estimation. • One might propose a compromise and start the simulation at

some other time, say 11 am with no customers present.• However, there is no guarantee that the conditions in the

simulation at noon will be representative of the actual conditions in the bank at noon.

24


Second approach• Collect data on the number of customers present in the bank at

noon for several different days. • Let pi be the proportion of these days that i customers (i = 0, 1,

…) are present at noon. • Then we simulate the bank from noon to 1 pm with number of

customers present at noon being randomly chosen from the distribution {pi}.

• If more than one simulation run is required, then a different sample of {pi} is drawn for each run. So that the performance measure is IID.

25

Statistical analysis of steady-state parameters

• Let Y1, Y2, … Ym be the output stochastic process from a single run of a non-terminating simulation.

• Suppose that P(Yi <= y) = Fi(y) → F(y) = P(Y <= y) as i goes to ∞.

• Here Y is the steady state random variable of interest with distribution function F.

• Then φ is a steady-state parameter if it is a characteristic of Y such as E[Y], F(Y).

• One problem in estimating φ is that the distribution function of Yi is different from F, since it is generally not possible to choose I to be representative of the “steady state behavior.”

26


• This causes an estimator of based on observations Y1, Y2, … Ym not to be “representative.”

• This is called the problem of initial transient.

• Suppose that we want to estimate the steady-state mean E[Y], which is generally given as:

• Most serious problem is:

].[lim ii

YE

.any for ][ mYE m

27


• The technique that is most commonly used is the warming up of the model or initial data deletion.

• The idea is to delete some number of observations from the beginning of a run and to use only the remaining observations to estimate the mean. So:

• Question now is: How to choose the warm-up period l?

.),( 1

lm

YlmY

m

lii

28


• Simplest general technique for determining l is a graphical procedure.

• Its specific goal is to determine a time index l such that E[Yi] = ν for i > l, where l is the warm-up period.

• This is equivalent to determining when the transient mean curve E[Yi] “flattens out” at level ν.

29


1Y11 Y12 Y13 … … Y1m

2Y21 Y22 Y23 … … Y2m

… … … … … …

nYn1 Yn2 Yn3 … … Ynm

Avg.… …

1Y 2Y 3Y mY

30


• The moving average for a window w (?) is defined as:

wmwiifi

Y

wiifi

Y

wYw

wssi

i

issi

i

,....1 12

,...1 12

)(

)1(

)1(

31


• We take the moving average of the observation means to smooth out the high-frequency oscillations in the observation means (but leave out low-frequency oscillations or long-run trend of interest).

• We plot these moving averages and choose the value of i beyond which the values appears to have converged as our warm-up period.

output analyses for single system. 2 why? often most of emphasis in a simulation experiment is on...

Documents