today: quizz 11: review. last quizz! wednesday: guest lecture – multivariate analysis
Post on 08-Feb-2016
39 Views
Preview:
DESCRIPTION
TRANSCRIPT
Today:Quizz 11: review. Last quizz!
Wednesday:Guest lecture – Multivariate Analysis
Friday:last lecture: review – Bring questions
DEC 8 – 9am
FINAL EXAMEN 2007
Resampling
ResamplingIntroduction
We have relied on idealized models of the origins of our data (ε ~N) to make inferences
But, these models can be inadequate
Resampling techniques allow us to base the analysis of a study solely on the design of that study, rather than on a poorly-fitting model
• Fewer assumptions– Ex: resampling methods do not require that distributions
be Normal or that sample sizes be large
• Generality: Resampling methods are remarkably similar for a wide range of statistics and do not require new formulas for every statistic
• Promote understanding: Boostrap procedures build intuition by providing concrete analogies to theoretical concepts
ResamplingWhy resampling
Resampling
Collection of procedures to make statistical inferences without relying on parametric assumptions
- bias
- variance, measures of error
- Parameter estimation
- hypothesis testing
ResamplingProcedures
Permutation (randomization)
Bootsrap
Jackknife
Monte Carlo techniques
Resampling
With replacement
Without replacement
Resamplingpermutation
Resamplingpermutation
If number of samples in each group=10And n groups=2 number of permutations=184756If n groups = 3 number of permutations > 5 000 billions
ResamplingBootstrap
"to pull oneself up by one's bootstraps"
The Surprising Adventures of Baron Munchausen, (1781) by Rudolf Erich Raspe
Bradley Efron 1979
ResamplingBootstrap
Hypothesis testing, parameter estimation, assigning measures of accuracy to sample estimates
e.g.: se, CI
Useful when:
formulas for parameter estimates are based on assumptions that are not met
computational formulas only valid for large samples
computational formulas do not exist
ResamplingBootstrap
Assume that sample is representative of population
Approximate the distribution of the population by repeatedly resampling (with replacement) from the sample
ResamplingBootstrap
ResamplingBootstrap
ResamplingBootstrap
Non-parametric bootstrapresample observation from original samples
Parametric bootstrapfit a particular model to the data and then use this model to
produce bootstrap samples
Confidence intervals Non Parametric Bootstrap
Large Capelin
Small Capelin
OtherPrey
Confidence intervals Non Parametric Bootstrap
74 lc %Nlc = 48.3%
76 sc %Nsc = 49.6%
3 ot %Nlc = 1.9%
What about uncertainty around the point estimates?
Bootstrap
Confidence intervals Non Parametric Bootstrap
Bootstrap:
- 153 balls, each with a tag: 76 sc, 74 lc, 3 ot
- Draw 153 random samples (with replacement) and record tag
- Calculate %Nlc*1, %Nsc*1, %Not*1
- Repeat nboot=50 000 times
(%Nlc*1, %Nsc*1, %Not*1),
(%Nlc*2, %Nsc*2, %Not*2),…,
( %Nlc*nboot, %Nsc*nboot, %Not*nboot)
- sort the %Ni*b UCL = 48750th %Ni*b (0.975)
LCL = 1250th %Ni*b (0.025)
Confidence intervalsParametric Bootstrap
Confidence intervalsParametric Bootstrap
Confidence intervalsParametric Bootstrap
Confidence intervalsParametric Bootstrap
Params
β
α
Confidence intervalsParametric Bootstrap
Params
β*1
α*1
Confidence intervalsParametric Bootstrap
Params
β*2
α*2
Confidence intervalsParametric Bootstrap
Confidence intervalsParametric Bootstrap
Params
β*nboot
α*nboot
Params
β*1, β*2, …,β*nboot
Construct Confidence Interval for β
1. Percentile Method
2. Bias-Corrected Percentile Confidence Limits
3. Accelerated Bias-Corrected Percentile Limits
4 .Bootstrap-t
5, 6, …. , Other methods
Confidence intervalsParametric Bootstrap
BootstrapCI – Percentile Method
BootstrapCI – Percentile Method
BootstrapCaveats
Independence
Incomplete data
Outliers
Cases where small perturbations to the data-generating process produce big swings in the sampling distribution
ResamplingJackknife
Tukey 1958
Quenouille 1956
Estimate bias and variance of a statistic
Concept: Leave one observation out and recompute statistic
Cross-validationJackknife
Assess the performance of the model
How accurately will the model predict a new observation?
Cross-validationJackknife
Cross-validationJackknife
Cross-validationJackknife
Cross-validationJackknife
Cross-validationJackknife
Cross-validationJackknife
Cross-validationJackknife
Cross-validationJackknife
Cross-validationJackknife
Cross-validationJackknife
Cross-validationJackknife
Cross-validationJackknife
Cross-validationJackknife
Cross-validationJackknife
Jackknife Bootstrap Differences
Both estimate variability of a statistic between subsamples
Jackknife provides estimate of the variance of an estimator
Bootstrap first estimates the distribution of the estimator. From this distribution, we can estimate the variance
Using the same data set:
bootstrap results will always be different (slightly)
jackknife results will always be identical
ResamplingMonte Carlo
Stanisław Ulam
John von Neumann
Mid 1940s
“The first thoughts and attempts I made to practice [the Monte Carlo Method] were suggested by a question which occurred to me in 1946 as I was convalescing from an illness and playing solitaires. The question was what are the chances that a Canfield solitaire laid out with 52 cards will come out successfully? After spending a lot of time trying to estimate them by pure combinatorial calculations, I wondered whether a more practical method than "abstract thinking" might not be to lay it out say one hundred times and simply observe and count the number of successful plays.”
ResamplingMonte Carlo
Monte Carlo methodS:
not just one no clear consensus on how they should be defined
Commonality:
repeated sampling from populations with known characteristics,
i.e. we assume a distribution and create random samples that follow that distribution, then compare our estimated statistic to the
distribution of outcomes
Monte Carlo
Goal: assess the robustness of constant escapement and constant harvest rate policies with respect to management error for Pacific salmon fisheries
Monte Carlo
Spawner-return dynamic model
Constant harvest rate
Constant escapement
Stochastic production variation
Management error
Average catch CV of catch
Discussion
Was the bootstrap example I showed parametric or non-parametric?
Could you think an example of the other case?
So, what’s the difference between a bootstrap and a Monte Carlo?
74 large capelin
76 small capelin
3 other prey
Bootstrap samples:
153 balls, each with a tag: sc, lc, ot
Draw 153 random samples and record the tag
Repeat 50 000 times
DiscussionIn principle both the parametric and the non-parametric bootstrap are special cases of Monte Carlo simulations used for a very specific purpose: estimate some characteristics of the sampling distribution.
The idea behind the bootstrap is that the sample is an estimate of the population, so an estimate of the sampling distribution can be obtained by drawing many samples (with replacement) from the observed sample, compute the statistic in each new sample.
Monte Carlo simulations are more general: basically it refers to repeatedly creating random data in some way, do something to that random data, and collect some results.
This strategy could be used to estimate some quantity, like in the bootstrap, but also to theoretically investigate some general characteristic of an estimator which is hard to derive analytically.
In practice it would be pretty safe to presume that whenever someone speaks of a Monte Carlo simulation they are talking about a theoretical investigation, e.g. creating random data with no empirical content what so ever to investigate whether an estimator can recover known characteristics of this random `data', while the (parametric) bootstrap refers to an emprical estimation. The fact that the parametric bootstrap implies a model should not worry you: any empirical estimate is based on a model. Hope this helps, Maarten ----------------------------------------- Maarten L. Buis
http://www.stata.com/statalist/archive/2008-06/msg00802.html
top related