2_binomial_monte_carlo.doc

Upload: phuong-ho

Post on 04-Jun-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/14/2019 2_Binomial_Monte_Carlo.doc

    1/7

    Lab 2Using Stata to Do Monte Carlo Experiments

    In this lab, we will:

    Learn how to generate uniformly distributed random numbers.

    Learn how to generate some discrete random variables.

    Illustrate the law of large numbers.

    Introduce the Monte Carlo method of studying probability distributions.

    Apply the Monte Carlo method to the binomial probability distribution.

    We will also introduce the following tata programming techni!ues and s"ills:

    #unctions.

    Conditional e$pressions.

    %he assignment operator, &'(.

    calar variables.

    )sing &do( files.

    %he techni!ues we will learn in this lab are very useful for illustrating many concepts of

    probability and statistics. In addition, we illustrate the basic concept and practice of theso*called Monte Carlo method of analysis or e$perimentation.

    Generating uniformly distributed random numbers.

    A continuously distributed random variable that is e!ually li"ely to ta"e any value

    between +ero and one has a standard uniform probability distribution. uch a variable

    can be created in tata with the uniform()function. enerate -, draws from a

    standard uniform distribution and inspect the results in the browser.

    set obs 10000gen u=uniform()

    browse

    %he last command can be e$ecuted by clic"ing on the browser icon.

    tata/s uniform random number generator returns a number between +ero and one,

    e$clusive of one itself.

    Functions.

    uniformis the name of a function in tata. #unction names in tata must always be

    followed by an open parenthesis with no intervening spaces. Why no intervening spaces0

    1ecause otherwise tata will thin" the name is the name of a variable, and not a function.%he pair of parentheses surrounds the argument or arguments of the function. In this casethe uniform function has no arguments, but the parentheses are needed anyway. If a

    function has more than one argument, then the arguments are separated by commas.

    Grap te uniformly distributed random !ariable.

    raph the random variable using the menu2dialogue window:

    -

    -

    3

    45

    6

    7

    8

    9

    -

    --

    -3

    -4

    -5

    -6

    -7

    -8-9

    -

    3

    3-

    33

    34

    3536

    37

    3839

    3

    4

    4-

    43

    44

    454647

    48

    49

    4

    -

  • 8/14/2019 2_Binomial_Monte_Carlo.doc

    2/7

    raphics ; orbins? to use in constructing the graph. @otice that the density of the random variable isessentially constant throughout its range, which is why the distribution has the name

    &uniform(.

    Generating a discrete random !ariable" Example simulating te rollsof a die.

    %he uniform random number generator is a building bloc" for creating virtually any

    random variable. We will illustrate this by using it to simulate rolling a die. %he

    following commands simulate twenty rolls of a die. #ist, however, the number ofobservations in tata must be set to twenty, and in order to do this the memory must be

    &cleared(.clear

    set obs 10

    gen x=int(6*uniform())+1

    browse

  • 8/14/2019 2_Binomial_Monte_Carlo.doc

    3/7

    #e la$ of large numbers and te fre%uentist notion of probability

    %he limit in the fre!uentist notion of probability is the law of large numbers, that is, as

    the sample si+e or number of trials increases towards infinity, the sample proportionfavorable to an event approaches &settles down to( the probability of the event. We

    will illustrate this by increasing the number of rolls of the die, and noticing that the

    sample distribution of outcomes settles down to the theoretical discrete uniformdistribution of -27thprobability for each side of the die.

    In order to do this, repeat the commands above, each time changing the number ofobservations to be 3, then -, then 6,. Also, change the name for each graph as

    indicated below. %he easiest way to do this is to single*clic" on each command in the

    Eeview window, and then edit it in the Command window as necessary.

    clear

    set obs 50

    gen x=int(6*uniform())+1

    histogram x! "iscrete fraction #line(.16666$) xlabel(%6) title(n=50)

    name(g2! re&lace)

    clear

    set obs 200

    gen x=int(6*uniform())+1

    histogram x! "iscrete fraction #line(.16666$) xlabel(%6) title(n=200)

    name(g3! re&lace)

    clear

    set obs 10000

    gen x=int(6*uniform())+1

    histogram x! "iscrete fraction #line(.16666$) xlabel(%6) title(n=10000)

    name(g4! re&lace)

    #inally, view all the graphs together with the following command:

    gra&h combine g1 g' g g! title(aw of arge umbers)

    Monte Carlo Estimation of a &robability Distribution

    ver the last few decades, the use of the computer to study the probability distribution of

    a random variable has become commonplace. %he techni!ue is powerful when the theoryand2or mathematics of the random process are2is too difficult to understand or to derive.

    If one simply "nows how the data of a random process are generated the data

    generation process or BJ then one can use a computer to create a large sample drawnfrom the un"nown distribution. %he law of large numbers can then applied to estimate

    virtually any aspect of the distribution. We will illustrate this by estimating the binomial

    probability distribution when 3.and- ==n . We "now what the actual probability

    distribution is, including its mathematical representation, but pretend that we "nownothing more than the assumptions of how the binomial process generates data: that n

    independent trials each have a -26thprobability of success.

    4

    9

    9-

    9394

    95

    9697

    98

    999

    -

    3

    4

    5

    6

    7

    8

    9

    -

    --

    -3

    -4

    -5

    -6

    -7

    -8

    -9

    -

    --

    ---

    --3

    --4

    --5--6

    --7

    --8--9

    --

    -3

    -3--33

    -34

    -35

    4

  • 8/14/2019 2_Binomial_Monte_Carlo.doc

    4/7

    In order to provide a concrete conte$t for this illustration, let/s assume that you want to

    "now the probability distribution of the number of patients &cured( in a drug trial of -

    treated patients, where the probability of any one patient being &cured( by the drug is 3percent. Dou might be interested in "nowing such things as how many patients would

    you e$pect to be cured in this drug trial0 What is the most li"ely number of patients to be

    cured0 What/s the chance than none, one, or any given number of patients in a trial arecured0

    Bon/t get the word &trial( in the phrase &drug trial( mi$ed up with the word &trial( in thephrase &the number of trials in a binomial e$periment(. +ero? iffalse. In this case, for each of -, observations, tata compares the random draw

    from the standard uniform distribution to &>the number .3?. If the random draw is less

    than &, the conditional e$pression evaluates to - else if the random draw is greater than

    5

    -36

    -37

    -38-39

    -3

    -4-4-

    -43

    -44-45

    -46

    -47

    -48-49

    -4

    -5

    -5--53

    -54

    -55

    -56

    -57

    -58

    -59

    -5-6

    -6-

    -63

    -64

    -65

    -66

    -67

    -68

    -69-6

    -7

    -7--73

    -74

    -75-76

    -77

    -78

    5

  • 8/14/2019 2_Binomial_Monte_Carlo.doc

    5/7

    or e!ual to &, the conditional e$pression evaluates to . %he result, or -, is then

    assigned to the variable x1for that observation.

    %his assignment is indicated by the assignment operator, &=( in the statement

    &gen x1=uniform(),&(. @ote that this &=( symbol has a different meaning in computer

    programming than in algebra. In algebra, it asserts that both sides of the e!uation havethe same value >both sides are e!ual?. In computer programming, it means to ta"e the

    value of what is on its right, and give it to assign it to what is on its left. #or e$ample,

    in computer programming, the statement &x=x+1( means to increment the variable xby 1,

    but in algebra this is a nonsense false statement.

    When x1ta"es the value , that represents a patient who is not cured when x1ta"es the

    value -, that represents a patient who is cured. %he interpretation of the variable x1is

    that its -, observations represent the outcomes for the first patient in each of the

    -, drug trials. %hese of course, are different people.

    Using (do) files in Stata)sing &do( files is sometimes a convenient way to do wor" in tata. %he following will

    illustrate a typical use of do files.

    >-? In the Eeview window clic" on the header rc. %his will separate the

    commands with errors from the rest of the commands.

    >3? elect the four lines >commands? beginning with &clear( and ending with

    &gen x1=uniform(),&(.

    >4? Eight clic" on them and choose &end to Bo*file Kditor(.

    >5? %his is a simple te$t editor. Dou will now create the generate commands forthe remaining patients in each trial. elect the generate command and copy

    it nine times, so there are - identical generate commands.>6? We will call the other patients x', x, etc. Kdit the generate commands

    appropriately. nly the digit immediately following xin each command has

    to be changed. Dou should now have -4 lines >commands? in the &do( file,

    with the last - being the generate commands.

    >7? As the last command in the do file, type the following command, whichcreates a new variable sthat is the sum of x1through x10:gen s=x1+x'+x+x+x5+x6+x$+x-+x+x10

    >8? Clic" the &ave( icon on the toolbar of the do*file editor. A &ave #ile( type

    window will open. Joint it to your flash drive and type the name &monte( inthe file name bo$. %hen clic" the &ave( button.

    >9? Clic" the &Bo( icon in the Bata Kditor/s tool bar.>? tata will attempt to e$ecute each command in the do file, as if you had typed

    each in the command window. If there is an error >red type in the Eesults

    window?, restore the &tata Bo*file Kditor( window, fi$ the command>s? that

    caused the error, and redo steps >8?*>9?.

    6

    -79

    -7

    -8

    -8-

    -83

    -84-85

    -86

    -87

    -88

    -89

    -8

    -9

    -9-

    -93

    -94-95

    -96

    -97-98

    -99

    -9

    -

    --

    -3-4

    -5-6

    -7

    -8

    -9

    -33-

    33

    3435

    363738

    39

    3

    6

  • 8/14/2019 2_Binomial_Monte_Carlo.doc

    6/7

    *nspect and grap te Monte Carlo estimate of te binomialprobability distribution

    Let/s loo" at the Monte Carlo estimate of the probability distribution. o into tata/s

    browser and loo" at the first row. %his represents the first drug trial of - patents. Which

    patients had successful outcomes0 Which did not0 What does the variable &s(

    represent0 It represents the number of patients in this trial with successful outcomes, i.e.,the number of successes in - binomial trials. =erify this interpretation with the ne$t

    drug trial or two. %he variable &s( is therefore the variable of interest. It is a random

    variable with a binomial distribution, i.e., ( )3.,-M ==nbS . Let/s graph its estimatedprobability distribution using the menu system:

    raphics ; O?

    -.8

    - 37.93 4.3

    4 3.-

    5 9.96 3.7

    7 .7

    8 .-9 .

    .

    -

    .

    %he binomialtail()function in tata >see

  • 8/14/2019 2_Binomial_Monte_Carlo.doc

    7/7

    and the third is , the probability of success in each trial. In short,

    ( ) ( ),,Jr

    snilbinomialtasS = . Chec" the Monte Carlo estimate of the probability of

    5 or more patient cures in a drug trial of - patients against the actual value of this

    probability given by:

    "is&la# binomialtail(10!!.')

    #e &o$er of (do) files and scalar !ariables

    %o illustrate the power of using &do( files and scalar variables, let/s repeat this analysis

    for 5.= instead. imply restore the &tata Bo*file Kditor( window and edit the scalarcommand appropriately, then clic" the &Bo( tool again. Eecall the appropriate inspection

    commands from the Eeview window to view the results.

    E+D

    8

    349

    34

    35

    35-

    353

    354

    355

    356357

    358

    359

    8