2_binomial_monte_carlo.doc
TRANSCRIPT
-
8/14/2019 2_Binomial_Monte_Carlo.doc
1/7
Lab 2Using Stata to Do Monte Carlo Experiments
In this lab, we will:
Learn how to generate uniformly distributed random numbers.
Learn how to generate some discrete random variables.
Illustrate the law of large numbers.
Introduce the Monte Carlo method of studying probability distributions.
Apply the Monte Carlo method to the binomial probability distribution.
We will also introduce the following tata programming techni!ues and s"ills:
#unctions.
Conditional e$pressions.
%he assignment operator, &'(.
calar variables.
)sing &do( files.
%he techni!ues we will learn in this lab are very useful for illustrating many concepts of
probability and statistics. In addition, we illustrate the basic concept and practice of theso*called Monte Carlo method of analysis or e$perimentation.
Generating uniformly distributed random numbers.
A continuously distributed random variable that is e!ually li"ely to ta"e any value
between +ero and one has a standard uniform probability distribution. uch a variable
can be created in tata with the uniform()function. enerate -, draws from a
standard uniform distribution and inspect the results in the browser.
set obs 10000gen u=uniform()
browse
%he last command can be e$ecuted by clic"ing on the browser icon.
tata/s uniform random number generator returns a number between +ero and one,
e$clusive of one itself.
Functions.
uniformis the name of a function in tata. #unction names in tata must always be
followed by an open parenthesis with no intervening spaces. Why no intervening spaces0
1ecause otherwise tata will thin" the name is the name of a variable, and not a function.%he pair of parentheses surrounds the argument or arguments of the function. In this casethe uniform function has no arguments, but the parentheses are needed anyway. If a
function has more than one argument, then the arguments are separated by commas.
Grap te uniformly distributed random !ariable.
raph the random variable using the menu2dialogue window:
-
-
3
45
6
7
8
9
-
--
-3
-4
-5
-6
-7
-8-9
-
3
3-
33
34
3536
37
3839
3
4
4-
43
44
454647
48
49
4
-
-
8/14/2019 2_Binomial_Monte_Carlo.doc
2/7
raphics ; orbins? to use in constructing the graph. @otice that the density of the random variable isessentially constant throughout its range, which is why the distribution has the name
&uniform(.
Generating a discrete random !ariable" Example simulating te rollsof a die.
%he uniform random number generator is a building bloc" for creating virtually any
random variable. We will illustrate this by using it to simulate rolling a die. %he
following commands simulate twenty rolls of a die. #ist, however, the number ofobservations in tata must be set to twenty, and in order to do this the memory must be
&cleared(.clear
set obs 10
gen x=int(6*uniform())+1
browse
-
8/14/2019 2_Binomial_Monte_Carlo.doc
3/7
#e la$ of large numbers and te fre%uentist notion of probability
%he limit in the fre!uentist notion of probability is the law of large numbers, that is, as
the sample si+e or number of trials increases towards infinity, the sample proportionfavorable to an event approaches &settles down to( the probability of the event. We
will illustrate this by increasing the number of rolls of the die, and noticing that the
sample distribution of outcomes settles down to the theoretical discrete uniformdistribution of -27thprobability for each side of the die.
In order to do this, repeat the commands above, each time changing the number ofobservations to be 3, then -, then 6,. Also, change the name for each graph as
indicated below. %he easiest way to do this is to single*clic" on each command in the
Eeview window, and then edit it in the Command window as necessary.
clear
set obs 50
gen x=int(6*uniform())+1
histogram x! "iscrete fraction #line(.16666$) xlabel(%6) title(n=50)
name(g2! re&lace)
clear
set obs 200
gen x=int(6*uniform())+1
histogram x! "iscrete fraction #line(.16666$) xlabel(%6) title(n=200)
name(g3! re&lace)
clear
set obs 10000
gen x=int(6*uniform())+1
histogram x! "iscrete fraction #line(.16666$) xlabel(%6) title(n=10000)
name(g4! re&lace)
#inally, view all the graphs together with the following command:
gra&h combine g1 g' g g! title(aw of arge umbers)
Monte Carlo Estimation of a &robability Distribution
ver the last few decades, the use of the computer to study the probability distribution of
a random variable has become commonplace. %he techni!ue is powerful when the theoryand2or mathematics of the random process are2is too difficult to understand or to derive.
If one simply "nows how the data of a random process are generated the data
generation process or BJ then one can use a computer to create a large sample drawnfrom the un"nown distribution. %he law of large numbers can then applied to estimate
virtually any aspect of the distribution. We will illustrate this by estimating the binomial
probability distribution when 3.and- ==n . We "now what the actual probability
distribution is, including its mathematical representation, but pretend that we "nownothing more than the assumptions of how the binomial process generates data: that n
independent trials each have a -26thprobability of success.
4
9
9-
9394
95
9697
98
999
-
3
4
5
6
7
8
9
-
--
-3
-4
-5
-6
-7
-8
-9
-
--
---
--3
--4
--5--6
--7
--8--9
--
-3
-3--33
-34
-35
4
-
8/14/2019 2_Binomial_Monte_Carlo.doc
4/7
In order to provide a concrete conte$t for this illustration, let/s assume that you want to
"now the probability distribution of the number of patients &cured( in a drug trial of -
treated patients, where the probability of any one patient being &cured( by the drug is 3percent. Dou might be interested in "nowing such things as how many patients would
you e$pect to be cured in this drug trial0 What is the most li"ely number of patients to be
cured0 What/s the chance than none, one, or any given number of patients in a trial arecured0
Bon/t get the word &trial( in the phrase &drug trial( mi$ed up with the word &trial( in thephrase &the number of trials in a binomial e$periment(. +ero? iffalse. In this case, for each of -, observations, tata compares the random draw
from the standard uniform distribution to &>the number .3?. If the random draw is less
than &, the conditional e$pression evaluates to - else if the random draw is greater than
5
-36
-37
-38-39
-3
-4-4-
-43
-44-45
-46
-47
-48-49
-4
-5
-5--53
-54
-55
-56
-57
-58
-59
-5-6
-6-
-63
-64
-65
-66
-67
-68
-69-6
-7
-7--73
-74
-75-76
-77
-78
5
-
8/14/2019 2_Binomial_Monte_Carlo.doc
5/7
or e!ual to &, the conditional e$pression evaluates to . %he result, or -, is then
assigned to the variable x1for that observation.
%his assignment is indicated by the assignment operator, &=( in the statement
&gen x1=uniform(),&(. @ote that this &=( symbol has a different meaning in computer
programming than in algebra. In algebra, it asserts that both sides of the e!uation havethe same value >both sides are e!ual?. In computer programming, it means to ta"e the
value of what is on its right, and give it to assign it to what is on its left. #or e$ample,
in computer programming, the statement &x=x+1( means to increment the variable xby 1,
but in algebra this is a nonsense false statement.
When x1ta"es the value , that represents a patient who is not cured when x1ta"es the
value -, that represents a patient who is cured. %he interpretation of the variable x1is
that its -, observations represent the outcomes for the first patient in each of the
-, drug trials. %hese of course, are different people.
Using (do) files in Stata)sing &do( files is sometimes a convenient way to do wor" in tata. %he following will
illustrate a typical use of do files.
>-? In the Eeview window clic" on the header rc. %his will separate the
commands with errors from the rest of the commands.
>3? elect the four lines >commands? beginning with &clear( and ending with
&gen x1=uniform(),&(.
>4? Eight clic" on them and choose &end to Bo*file Kditor(.
>5? %his is a simple te$t editor. Dou will now create the generate commands forthe remaining patients in each trial. elect the generate command and copy
it nine times, so there are - identical generate commands.>6? We will call the other patients x', x, etc. Kdit the generate commands
appropriately. nly the digit immediately following xin each command has
to be changed. Dou should now have -4 lines >commands? in the &do( file,
with the last - being the generate commands.
>7? As the last command in the do file, type the following command, whichcreates a new variable sthat is the sum of x1through x10:gen s=x1+x'+x+x+x5+x6+x$+x-+x+x10
>8? Clic" the &ave( icon on the toolbar of the do*file editor. A &ave #ile( type
window will open. Joint it to your flash drive and type the name &monte( inthe file name bo$. %hen clic" the &ave( button.
>9? Clic" the &Bo( icon in the Bata Kditor/s tool bar.>? tata will attempt to e$ecute each command in the do file, as if you had typed
each in the command window. If there is an error >red type in the Eesults
window?, restore the &tata Bo*file Kditor( window, fi$ the command>s? that
caused the error, and redo steps >8?*>9?.
6
-79
-7
-8
-8-
-83
-84-85
-86
-87
-88
-89
-8
-9
-9-
-93
-94-95
-96
-97-98
-99
-9
-
--
-3-4
-5-6
-7
-8
-9
-33-
33
3435
363738
39
3
6
-
8/14/2019 2_Binomial_Monte_Carlo.doc
6/7
*nspect and grap te Monte Carlo estimate of te binomialprobability distribution
Let/s loo" at the Monte Carlo estimate of the probability distribution. o into tata/s
browser and loo" at the first row. %his represents the first drug trial of - patents. Which
patients had successful outcomes0 Which did not0 What does the variable &s(
represent0 It represents the number of patients in this trial with successful outcomes, i.e.,the number of successes in - binomial trials. =erify this interpretation with the ne$t
drug trial or two. %he variable &s( is therefore the variable of interest. It is a random
variable with a binomial distribution, i.e., ( )3.,-M ==nbS . Let/s graph its estimatedprobability distribution using the menu system:
raphics ; O?
-.8
- 37.93 4.3
4 3.-
5 9.96 3.7
7 .7
8 .-9 .
.
-
.
%he binomialtail()function in tata >see
-
8/14/2019 2_Binomial_Monte_Carlo.doc
7/7
and the third is , the probability of success in each trial. In short,
( ) ( ),,Jr
snilbinomialtasS = . Chec" the Monte Carlo estimate of the probability of
5 or more patient cures in a drug trial of - patients against the actual value of this
probability given by:
"is&la# binomialtail(10!!.')
#e &o$er of (do) files and scalar !ariables
%o illustrate the power of using &do( files and scalar variables, let/s repeat this analysis
for 5.= instead. imply restore the &tata Bo*file Kditor( window and edit the scalarcommand appropriately, then clic" the &Bo( tool again. Eecall the appropriate inspection
commands from the Eeview window to view the results.
E+D
8
349
34
35
35-
353
354
355
356357
358
359
8