social statistics estimation and complex survey design ian plewis, ccsr, university of manchester
TRANSCRIPT
![Page 1: Social Statistics Estimation and complex survey design Ian Plewis, CCSR, University of Manchester](https://reader036.vdocuments.pub/reader036/viewer/2022082709/56649b58550346318e8d75dc/html5/thumbnails/1.jpg)
Social Statistics
Estimation and complex survey design
Ian Plewis, CCSR, University of Manchester
![Page 2: Social Statistics Estimation and complex survey design Ian Plewis, CCSR, University of Manchester](https://reader036.vdocuments.pub/reader036/viewer/2022082709/56649b58550346318e8d75dc/html5/thumbnails/2.jpg)
Social Statistics
PARAMETER: Population Mean, Probability, Variance etc.: μ, π, σ²
ESTIMATOR: Sample Mean, Proportion, Variance etc.: , p, s2
ESTIMATE: Usually a number calculated from the observed data.
We combine the estimate and its standard error to make inferences
about the parameter.
x
![Page 3: Social Statistics Estimation and complex survey design Ian Plewis, CCSR, University of Manchester](https://reader036.vdocuments.pub/reader036/viewer/2022082709/56649b58550346318e8d75dc/html5/thumbnails/3.jpg)
Social Statistics
With simple random sampling, the standard error of is σ ∕√n usually
estimated by s ∕√n where n is the sample size.
And the standard error of p is usually estimated by .
x
n/)1( npp /)1(
![Page 4: Social Statistics Estimation and complex survey design Ian Plewis, CCSR, University of Manchester](https://reader036.vdocuments.pub/reader036/viewer/2022082709/56649b58550346318e8d75dc/html5/thumbnails/4.jpg)
Social Statistics
MILLENNIUM COHORT STUDYStratification
The population was stratified by UK country - England, Wales, Scotland and Northern Ireland.
For England, the population was then stratified, via the stratification of electoral wards extant on 1 April 1998, into three strata:
1) The 'ethnic minority' stratum: children living in wards which, in the 1991 Census of Population, had an ethnic minority indicator of at least 30%.
2) The 'disadvantaged' stratum: children living in wards, other than those falling into stratum (1) above, which fell into the upper quartile (i.e. the poorest 25% of wards) of the ward-based Child Poverty Index.
3) The 'advantaged' stratum: children living in wards, other than those falling into stratum (1) above, which were not in the top quartile of the CPI. ‘Advantaged’ is therefore a relative term in this context.
For Wales, Scotland and Northern Ireland, there were just two strata: disadvantaged and advantaged.
![Page 5: Social Statistics Estimation and complex survey design Ian Plewis, CCSR, University of Manchester](https://reader036.vdocuments.pub/reader036/viewer/2022082709/56649b58550346318e8d75dc/html5/thumbnails/5.jpg)
Social Statistics
Clustering
The wish to bring the broader socio-economic context into the
analysis, particularly as represented by the areas or local
neighbourhoods that the children live in, and the need to keep field
costs down, led to the decision to cluster the sample. Moreover,
the chosen method of stratification - by characteristics of electoral
wards - meant that using wards, rather than alternative geographical
aggregates such as postcode sectors, was the most appropriate way
to implement the clustering. In addition, the issues of measuring local
context and reducing fieldwork costs pointed to the advantages of
including all births in selected wards in the sample, rather than sub-
sampling within wards.
![Page 6: Social Statistics Estimation and complex survey design Ian Plewis, CCSR, University of Manchester](https://reader036.vdocuments.pub/reader036/viewer/2022082709/56649b58550346318e8d75dc/html5/thumbnails/6.jpg)
Social Statistics
The sample is a disproportionately stratified cluster sample. The
disproportionality means that the sample is not self-weighting and so
weighted estimates of means, variances etc. are needed. The
clustering implies that observations are not independent and so
allowance must be made for the dependence so induced when
sampling errors are computed. It was likely that the design effects
for the sample would be greater than one. In other words, the sample
would be somewhat less precise than a simple random sample of the
same size would have been, although this depends on how far the
gains from stratification and systematic selection are offset by the
losses from clustering which, in turn, would vary across measures.
![Page 7: Social Statistics Estimation and complex survey design Ian Plewis, CCSR, University of Manchester](https://reader036.vdocuments.pub/reader036/viewer/2022082709/56649b58550346318e8d75dc/html5/thumbnails/7.jpg)
Social Statistics
Sampling Fractions and Weights across Strata
Each UK country UK GB Stratum (h) Number of
wards (Nh)
Ph fh wh fh wh fh wh
ENGLAND: Advantaged
5289 0.723 0.0208 1.32 0.554 2.00 0.585 1.78
ENGLAND: Disadvantaged
1853 0.253 0.0383 0.71 0.194 1.09 0.205 0.97
ENGLAND: Ethnic
169 0.023 0.112 0.24 0.018 0.37 0.019 0.33
WALES: Advantaged
345 0.557 0.067 1.77 0.036 0.62 0.038 0.55
WALES: Disadvantaged
274 0.443 0.182 0.65 0.029 0.23 0.030 0.20
SCOTLAND: Advantaged
709 0.634 0.045 1.23 0.074 0.93 0.078 0.82
SCOTLAND: Disadvantaged
409 0.366 0.073 0.75 0.043 0.57 0.045 0.51
N.IRELAND: Advantaged
258 0.516 0.089 1.41 0.027 0.47 n.a.
N.IRELAND: Disadvantaged
242 0.484 0.165 0.76 0.025 0.25 n.a.
![Page 8: Social Statistics Estimation and complex survey design Ian Plewis, CCSR, University of Manchester](https://reader036.vdocuments.pub/reader036/viewer/2022082709/56649b58550346318e8d75dc/html5/thumbnails/8.jpg)
Social Statistics
The weights should be applied when estimating a mean for the
sample, say.
In other words, the weighted mean for variable y is:
where i (i = 1..xh) indexes the elements in a stratum so xh is the
sample size for stratum h. The appropriate weights wh from the table
should be used.
i hhhhih
hw xwywy /
![Page 9: Social Statistics Estimation and complex survey design Ian Plewis, CCSR, University of Manchester](https://reader036.vdocuments.pub/reader036/viewer/2022082709/56649b58550346318e8d75dc/html5/thumbnails/9.jpg)
Social Statistics
Sampling Errors, Proportion of Cohort Members Regarded as Non-white by
Stratum and Country
STRATUM Mean
95% Confidence
Interval
Design Effect
Observations
England Advantaged 0.07 0.05 - 0.10 9.4 4608
England Disadvantaged
0.17 0.11 - 0.22 23.1 4504
England Ethnic 0.83 0.78 - 0.88 11.5 2384
Wales Advantaged 0.02 0.01 - 0.03 1.5 829
Wales Disadvantaged
0.05 0.02 - 0.07 5.8 1928
Scotland Advantaged
0.03 0.02 - 0.04 1.5 1141
Scotland Disadvantaged
0.03 0.02 - 0.05 2.5 1189
Northern Ireland Advantaged
0.01 0.00 - 0.02 1.5 723
Northern Ireland Disadvantaged
0.01 0.00 - 0.01 0.8 1198
England 0.15 0.12 - 0.17 10.3 11496
Wales 0.03 0.02 - 0.04 2.9 2757
Scotland 0.03 0.02 - 0.04 1.8 2330
Northern Ireland 0.01 0.00 - 0.01 1.4 1921
UK 0.13 0.10 - 0.15 15.6 18504
![Page 10: Social Statistics Estimation and complex survey design Ian Plewis, CCSR, University of Manchester](https://reader036.vdocuments.pub/reader036/viewer/2022082709/56649b58550346318e8d75dc/html5/thumbnails/10.jpg)
Social Statistics
Sampling Errors, Proportion of Natural Mothers who do not have a
Longstanding Illness by Stratum and Country
STRATUM Mean 95% Confidence
Interval
Design Effect
Observations
England Advantaged
0.78 0.77 - 0.80 1.9 4614
England Disadvantaged
0.78 0.76 - 0.80 2.4 4517
England Ethnic 0.83 0.81 - 0.86 2.5 2384
Wales Advantaged 0.80 0.77 - 0.82 1.1 831
Wales Disadvantaged
0.77 0.75 - 0.79 0.9 1927
Scotland Advantaged
0.81 0.78 - 0.85 1.9 1143
Scotland Disadvantaged
0.77 0.75 - 0.79 0.9 1189
Northern Ireland Advantaged
0.81 0.78 - 0.84 1.1 723
Northern Ireland Disadvantaged
0.78 0.76 - 0.80 1.1 1196
England 0.79 0.77 - 0.80 2.6 11491
Wales 0.78 0.77 - 0.80 1.1 2752
Scotland 0.80 0.78 - 0.82 1.6 2327
Northern Ireland 0.80 0.78 - 0.82 1.1 1917
UK 0.79 0.78 - 0.80 2.9 18487
![Page 11: Social Statistics Estimation and complex survey design Ian Plewis, CCSR, University of Manchester](https://reader036.vdocuments.pub/reader036/viewer/2022082709/56649b58550346318e8d75dc/html5/thumbnails/11.jpg)
Social Statistics
Sampling Errors, Family Income, Natural Mothers by Stratum and Country
STRATUM Mean 95% Confidence
Interval
Design Effect
Observations
England Advantaged 29547 27438 - 31655 11.3 4345
England Disadvantaged
17665 16502 - 18827 7.1 4168
England Ethnic 14654 12641 - 16667 13.0 1973
Wales Advantaged 25896 23629 - 28163 2.9 780
Wales Disadvantaged 16696 15359 - 18032 4.7 1841
Scotland Advantaged 27424 24093 - 30754 8.3 1014
Scotland Disadvantaged
18023 16032 - 20014 6.0 1132
Northern Ireland Advantaged
27110 24443 - 29778 3.5 648
Northern Ireland Disadvantaged
15747 14797 - 16697 1.8 1041
England 24942 23496 - 26389 16.9 10486
Wales 21624 20092 - 23155 6.3 2621
Scotland 23616 21458 - 25774 8.9 2146
Northern Ireland 21837 20012 - 23662 5.9 1689
UK 24505 23300 - 25710 20.6 16942
![Page 12: Social Statistics Estimation and complex survey design Ian Plewis, CCSR, University of Manchester](https://reader036.vdocuments.pub/reader036/viewer/2022082709/56649b58550346318e8d75dc/html5/thumbnails/12.jpg)
Social Statistics
Representation of design effects in STATA
MEFF: ratio of variance for all aspects of study design including all
weights to variance if the sample were a srs of the same size.
MEFT: √(MEFF)
DEFF: ratio of variance for all aspects of sample design to variance if
the sample were a srs of the same size.
DEFT: √(DEFF)
See Kreuter, F. and Valliant, R. (2007) STATA Journal, 7, 1-21
![Page 13: Social Statistics Estimation and complex survey design Ian Plewis, CCSR, University of Manchester](https://reader036.vdocuments.pub/reader036/viewer/2022082709/56649b58550346318e8d75dc/html5/thumbnails/13.jpg)
Social Statistics
Types of populations
The MCS population definition is a finite one; we could in principle
count all its members. The distinction between a finite and an infinite
population is an important one for statistical inference and analysis.
Finite populations are often regarded as samples from an infinite or
super-population or universe.
Inferences about finite populations are essentially descriptive. They
are widely used in sample surveys, especially by official statisticians. A
‘perfect’ Census of Population, in these circumstances, is not a sample: the
resulting inferences are made with complete certainty. Descriptive inferences
from finite populations are often known as design-based inferences.
![Page 14: Social Statistics Estimation and complex survey design Ian Plewis, CCSR, University of Manchester](https://reader036.vdocuments.pub/reader036/viewer/2022082709/56649b58550346318e8d75dc/html5/thumbnails/14.jpg)
Social Statistics
Often, we are more interested in analytic or model-based inferences. We want to test a hypothesis, for example, or we want to estimate a relation between two or more variables. We are then more interested in an infinite or super-population:
“As a basis for scientific generalizations and decisions, a census is only a sample…A census describes a population that is subject to the variations of chance because it is only one of the many possible populations that might have resulted from the same underlying system of social and economic causes.”
“A sample enquiry is then a sample of a sample, and a so-called 100% sample is simply a larger sample, but is still only a sample.” (quotes from Deming and Stephan, 1941, p.48).
![Page 15: Social Statistics Estimation and complex survey design Ian Plewis, CCSR, University of Manchester](https://reader036.vdocuments.pub/reader036/viewer/2022082709/56649b58550346318e8d75dc/html5/thumbnails/15.jpg)
Social Statistics
“Suppose we have data obtained by measuring the elements .. in a probability sample ..
drawn from the finite population U. Two important types of inference are as follows:
a. Inference about the finite population U itself.
b. Inference about a model or a superpopulation thought to have generated U.”
(Särndal et al., 1992)
Providing we have selected our sample from the finite population using, as a selection
mechanism, probability sampling (the simplest case of which is simple random sampling)
then we can rely on the idea of repeatedly sampling in the same way from the population
in order to generate what is known as a randomisation distribution which we can use to
make inferences about a parameter of interest.
A probability sample is one in which each member of the finite population has a known
and non-zero chance of being selected into the sample. In simple random sampling, these
known selection probabilities are equal.
![Page 16: Social Statistics Estimation and complex survey design Ian Plewis, CCSR, University of Manchester](https://reader036.vdocuments.pub/reader036/viewer/2022082709/56649b58550346318e8d75dc/html5/thumbnails/16.jpg)
Social Statistics
A probability sample is not essential for model-based inference.
Instead, we rely on being able to define the statistical or probability
model that generated the data and then using the likelihood of the
parameter given the model and the data for estimation. On the other
hand, a probability sample from a finite population guards against
bias and is usually regarded as desirable.
Finite populations are able to be defined in an unambiguous way.
Super-populations, although conceptually important, are abstractions
that are less easily defined. They will usually extend the finite
population across space and time.
![Page 17: Social Statistics Estimation and complex survey design Ian Plewis, CCSR, University of Manchester](https://reader036.vdocuments.pub/reader036/viewer/2022082709/56649b58550346318e8d75dc/html5/thumbnails/17.jpg)
Social Statistics
Classic reference:
Kish, L. (1965) Survey Sampling. New York: Wiley.
More recent reference:
Särndal, C.-E., Swensson, B. and Wretman, J. (1992) Model Assisted
Survey Sampling. New York: Springer.