how much should we trust ideal point estimates from...

30
How Much Should We Trust Ideal Point Estimates from Surveys? * William Marble Stanford University Ma Tyler Stanford University August 29, 2017 Abstract In recent years, ideological scaling of the mass public has become a standard method in scholars’ toolkit for studying representation, polarization, and public opinion. e assump- tion that citizens’ preferences can be represented in a low-dimensional space is typically adopted uncritically, despite substantial debate about whether most citizens have coherent political opinions. In this paper, we propose a strategy to test how well ideal point models can explain aitudes. e evaluation strategy, which can be derived from the spatial voting model commonly used to motivate ideal point estimators, uses cross-validation to assess the predictive performance of ideal point models. If the model assumptions are satised, ideal point models should be able to generate accurate out-of-sample predictions about survey re- sponses. However, we nd that these predictions are only marginally beer than those from a statistical model that does not include ideal points. In other words, knowing a respondent’s answers to N - 1 survey questions does not meaningfully improve our ability to predict their response to the N th question. In contrast, we nd ideal point models perform exceptionally well when applied to roll-call votes in the Senate. Additionally, we nd that there is no iden- tiable subset of the population that appears to respond to survey questions in a manner consistent with assumptions of the ideal point model. We discuss possible methodological and substantive explanations for the poor model performance in the public. Our results sug- gest that standard ideal point estimates in the mass public may be misleading measures of political orientation. * Preliminary dra prepared for the 2017 meeting of the American Political Science Association in San Francisco; comments are very welcome. For helpful discussions and comments, we thank Justin Grimmer and Jonathan Rodden, as well as participants in our cohort workshop and the Political Economy Breakfast at Stanford. [email protected] [email protected] 1

Upload: others

Post on 28-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: How Much Should We Trust Ideal Point Estimates from Surveys?wpmarble/docs/MarbleTyler_SurveyIdealPointsAPSA.pdfassumption of sincere voting is commonly made in the literature, even

How Much Should We Trust Ideal Point Estimatesfrom Surveys?∗

William Marble†Stanford University

Ma� Tyler‡Stanford University

August 29, 2017

Abstract

In recent years, ideological scaling of the mass public has become a standard method inscholars’ toolkit for studying representation, polarization, and public opinion. �e assump-tion that citizens’ preferences can be represented in a low-dimensional space is typicallyadopted uncritically, despite substantial debate about whether most citizens have coherentpolitical opinions. In this paper, we propose a strategy to test how well ideal point modelscan explain a�itudes. �e evaluation strategy, which can be derived from the spatial votingmodel commonly used to motivate ideal point estimators, uses cross-validation to assess thepredictive performance of ideal point models. If the model assumptions are satis�ed, idealpoint models should be able to generate accurate out-of-sample predictions about survey re-sponses. However, we �nd that these predictions are only marginally be�er than those froma statistical model that does not include ideal points. In other words, knowing a respondent’sanswers toN−1 survey questions does not meaningfully improve our ability to predict theirresponse to the N th question. In contrast, we �nd ideal point models perform exceptionallywell when applied to roll-call votes in the Senate. Additionally, we �nd that there is no iden-ti�able subset of the population that appears to respond to survey questions in a mannerconsistent with assumptions of the ideal point model. We discuss possible methodologicaland substantive explanations for the poor model performance in the public. Our results sug-gest that standard ideal point estimates in the mass public may be misleading measures ofpolitical orientation.

∗Preliminary dra� prepared for the 2017 meeting of the American Political Science Association in San Francisco;comments are very welcome. For helpful discussions and comments, we thank Justin Grimmer and Jonathan Rodden,as well as participants in our cohort workshop and the Political Economy Breakfast at Stanford.

[email protected][email protected]

1

Page 2: How Much Should We Trust Ideal Point Estimates from Surveys?wpmarble/docs/MarbleTyler_SurveyIdealPointsAPSA.pdfassumption of sincere voting is commonly made in the literature, even

1 Introduction

Studies of the quality of representation o�en focus whether citizens’ political opinions are faith-

fully re�ected in the policy preferences and decisions of legislators. One common way of ap-

proaching this question is to generate a summary of public opinion and compare it to a similar

summary of representatives’ positions. �e assumption is that the structure of political con�ict

is fairly low-dimensional: opinions on one issue tend to correlate highly with opinions on the

next. Reducing the dimensionality of the problem—e.g., from a ba�ery of questions on disparate

issues to a single liberal-conservative score—thus facilitates be�er inferences.

An increasingly common methodological approach in recent years is to apply ideal point esti-

mation methods to survey data. �ese methods, originally developed to estimate the preferences

of members of legislative bodies (e.g., Poole and Rosenthal, 1997; Clinton, Jackman and Rivers,

2004), generate low-dimensional numerical summaries of citizens’ political opinions.1 �ese stud-

ies have generated signi�cant conclusions about the nature of representation in the United States.

For instance, Bafumi and Herron (2010) �nd via joint scaling of representatives and constituents

that elected legislators tend to be more extremist than the people they represent. Similarly, Hill

and Tausanovitch (2015) estimate an ideal point model using ANES responses from 1956 and 2012

to examine the extent of polarization in the public. �ey conclude that the public has exhibited

only mild polarization over the past half-century, in contrast to Congress. In an application in

comparative politics, Pan and Xu (2017) apply ideal point models to survey responses from China

to characterize the structure of mass opinions in that country.2

Despite the increasing prevalence of ideal point models, there is no consensus about whether

their underlying assumptions hold among the mass public. A signi�cant body of work in politi-

cal behavior—dating at least to �e American Voter (Campbell et al., 1960) and Converse’s (1964)

“Nature of Belief Systems in Mass Publics”—questions whether most citizens have coherent polit-1Typically, researchers estimate one-dimensional models. However, several studies have estimated models with

two or more dimensions; e.g., see Treier and Hillygus (2009); Warshaw and Rodden (2009), and Pan and Xu (2017).2For other examples of ideal point estimation using surveys, see Tausanovitch and Warshaw (2014, 2013), and

Jessee (2009).

2

Page 3: How Much Should We Trust Ideal Point Estimates from Surveys?wpmarble/docs/MarbleTyler_SurveyIdealPointsAPSA.pdfassumption of sincere voting is commonly made in the literature, even

ical opinions at all—let alone opinions that can be meaningfully described by a low-dimensional

summary. A prominent view is that most citizens are innocent of ideology: people tend to know

li�le about politics, are unconstrained by elite notions of “what goes with what,” and responses

to survey questions are at best imperfect measures of true preferences—instead re�ecting �eet-

ing considerations and cognitive biases (e.g., Kuklinski and �irk, 2000; Bartels, 2003; Zaller and

Feldman, 1992; Kinder, 2003; Freeder, Lenz and Turney, 2016).

Given this backdrop, how much should we trust estimates of ideal points in the mass public?

If survey responses are largely idiosyncratic and unstructured, the resulting ideal point estimates

will be uninformative about any particular policy positions. For instance, in a recent critique

Broockman (2016) argues that ideal point models o�en erroneously conclude that the public is

moderate, when instead people tend to hold a number of extreme, but ideologically incongruent,

views. As a result, when averaging survey responses together in a one-dimensional ideal point

model, it appears that they are moderate.

At the core of this debate is the following question: If we observe responses to N − 1 policy

questions, how well can we predict the response to the N th question? Ideal point models are

based on a spatial voting model that assumes that survey responses can be located in an ideolog-

ical space that is common across questions, and over which respondents have a most preferred

location. �is implies that knowing the “location” of survey responses and voters’ ideal points

should allow us to predict responses relatively well. If some assumption of the model is violated,

however, we should observe poor predictive performance.

In this paper, we implement this evaluation strategy directly through a cross-validation pro-

cedure designed to evaluate how well ideal point models perform. Using responses to over 25

questions on the 2012 ANES, we estimate the out-of-sample predictive error associated with ideal

point models between 1 and 5 dimensions. We compare the predictive performance of these mod-

els to that of an intercept-only model—in e�ect, assuming that the probability of survey responses

is the same for all citizens (or, equivalently, that all citizens have the same ideal point). �is eval-

uation allows us to quantify how much information is actually contained in ideal point estimates.

3

Page 4: How Much Should We Trust Ideal Point Estimates from Surveys?wpmarble/docs/MarbleTyler_SurveyIdealPointsAPSA.pdfassumption of sincere voting is commonly made in the literature, even

Our primary result is that ideal point models perform poorly in the mass public. When we

use ideal point model estimates to predict out-of-sample survey responses, the improvement in

predictive performance compared to the intercept-only model is minuscule. In other words, the

low-dimensional summary measures produced by ideal point models in the mass public contain

very li�le information about how respondents would answer any particular survey question.

�is conclusion is robust to including up to 5 dimensions. In contrast, when we apply the same

procedure to votes in the Senate, there is a large gain in predictive performance going from an

intercept-only model to a one-dimensional model.

We also examine whether any observable covariates are correlated with an increase in predic-

tive performance, which might indicate that the result of ideal point models would be informative

for some subset of the population. However, we �nd li�le evidence to support this possibility.

Our results suggest caution in applying standard ideal point models to survey data. While

the model will generate ideal point estimates, these estimates contain relatively li�le information

about how people will respond to any given question. As such, substantive conclusions based on

them may be misleading.

�e rest of the paper is structured as follows. In Section 2, we outline the standard spatial

voting model and corresponding statistical model used to estimate ideal points, highlighting par-

ticular assumptions that might be violated in the public. In Section 3, we present our validation

strategy. Section 4 presents the results. Section 5 discusses the results, and Section 6 concludes.

2 Spatial Models of Policy Preferences

In this section, we outline the traditional ideal point model and the corresponding statistical

model.

Ideal point models of voting or survey responses are theoretically derived from a random

utility, spatial voting model (Downs, 1957; Black, 1948). In these models, policies k can be repre-

sented by a point xk ∈ Rd, where d is a small integer (usually one or two). Each political actor i

4

Page 5: How Much Should We Trust Ideal Point Estimates from Surveys?wpmarble/docs/MarbleTyler_SurveyIdealPointsAPSA.pdfassumption of sincere voting is commonly made in the literature, even

has rational, single-peaked preferences in the policy space that are represented by a utility func-

tion Ui(k) = f(x, θi) + νki , where f(·; θi) is a function that is uniquely maximized at x = θi and

strictly decreasing in the distance between the policy proposal and θi, and νki is a random variable

that represents idiosyncratic (non-spatial) utility shocks that voter i gets from policy k. In this

context, voters who are faced with a choice among options x1, x2, . . . , xj will vote for the policy

that maximizes their utility.

Empirical applications of the spatial voting model in political science typically focus on esti-

mating θi using some observed behavioral data—e.g., votes or survey responses. Analysts make

simple parametric assumptions on f and ν to derive empirically estimable versions of the spatial

voting model.

2.1 Assumptions of the Spatial Voting Model

To begin, we explicitly highlight several assumptions of the random utility spatial voting model.

First, and perhaps most controversially, the model assumes that voters have genuine pref-

erences over the choice set. If voters do not have well-formed preferences over choices, then

interpreting votes or survey responses as measures of revealed preference is obviously mis-

taken. Additionally, note that it is entirely possible that voters do have preferences on some issue

domains—e.g., conservative religious voters may care deeply about the legality of abortion—but

not on other issue domains. �e implication of this view is that some survey responses would

re�ect real policy preferences, while others would merely re�ect random or idiosyncratic factors.

A priori, however, the analyst cannot know which category a given response would fall into.

Second, the spatial voting model assumes that individual voters can correctly assess the lo-

cation of policy choices. Even if voters do have legitimate preferences, it is plausible that voters

do not have enough information about the choices presented to know which option best corre-

sponds to their latent preference. As a result, observed choices would only noisily re�ect “true”

preferences. In this case, the signal-to-noise ratio in survey responses would be low, making it

di�cult (a) to learn what actors’ ideal points actually are and (b) to generate accurate and precise

5

Page 6: How Much Should We Trust Ideal Point Estimates from Surveys?wpmarble/docs/MarbleTyler_SurveyIdealPointsAPSA.pdfassumption of sincere voting is commonly made in the literature, even

predictions of how respondents would answer additional survey questions.

Relatedly, a third assumption is that voters in fact perceive a common policy space. While

assumption 2 relates to the information that any given voter possesses, an additional assumption

necessary to make interpersonal comparisons based on choice data is that voters agree on the

underlying policy space. If some voters consider there to be two relevant dimensions to politics,

while other consider there to be four, then the decision process is fundamentally di�erent.

Fourth, the model assumes that voters choose the option that maximizes their utility. �e

assumption of sincere voting is commonly made in the literature, even when sincere voting may

not be an equilibrium of a legislative voting game (Austen-Smith and Banks, 1996).3 Clearly in the

case of survey responses in the public, “payo�s” do not depend on other respondents’ answers,

making strategic voting implausible. However, the assumption of sincere voting (or, in this case,

survey responses) could be violated if instead of reporting their true preferences, survey respon-

dents engage in expressive responses that reinforce their party’s stated position. �is behavior

would undermine the validity of ideal point estimates, but would require a high degree of sophis-

tication on respondents’ part, and might tend to overstate the amount of “ideological constraint”

that we observe. On the other hand, there are very few incentives for survey respondents to

give carefully considered responses. �is might result in a large number of responses that do not

re�ect true opinions, but instead re�ect satis�cing or other cognitive shortcuts (Holbrook, Green

and Krosnick, 2003).

Mild violations of any of these assumptions might not cause the statistical model to perform

poorly. However, it is likely that multiple violations could, in conjunction, cause poor model �t

and poor predictive performance.

2.2 Estimating Ideal Points

Given the spatial voting model outlined above, standard practice is to make several simple as-

sumptions on f and ν to generate a statistical model that yields estimates of the parameters of3�ough see Clinton and Meirowitz (2004) for an example of a paper that estimates ideal points assuming the

possibility of strategic voting.

6

Page 7: How Much Should We Trust Ideal Point Estimates from Surveys?wpmarble/docs/MarbleTyler_SurveyIdealPointsAPSA.pdfassumption of sincere voting is commonly made in the literature, even

interest. Because our main application regards survey data with many response options for each

question, we present a multinomial logistic regression model here.4

In particular, we assume that voters have quadratic utility over choices, such that

Ui(k) = −||x− θi||2 + νki . (1)

Additionally, we assume that the errors νki are distributed independently according to a standard

type-I extreme value distribution.

Suppose that when answering question j, voter i is faced with Kj possible options. Denote

voter i’s response by yij . �en the probability that voter i picks choice k is

Pr(yij = k) =exp(β′

jkθi − αjk)∑Kj

l=1 exp(β′jlθi − αjl)

. (2)

�is equation, which is the standard likelihood for a multinomial logistic model, can be derived

from the latent utility model in (1). �e parameters βj1 and αj1 are normalized to 0 to identify

the model. �e other β parameters measure how much option k “discriminates” between voters

with di�erent θ’s (relative to the baseline option k = 1), while the α parameters measure the

baseline popularity of option k (relative to the baseline).5�e likelihood for the full distribution

of data (i.e., all the survey responses or vote choices) is the product of the individual likelihoods

over voters and question.

Clearly, we have made several (possibly consequential) assumptions in converting a theoreti-

cal model of voting to a statistical model that can applied to data. As Clinton (2012) notes, there is

not a one-to-one correspondence between theoretical models of voting (or survey response) and

the statistical model applied to the data. In particular, the statistical model outlined above will4In the case of binary choices, this model is very similar to the model underlying the NOMINATE family of

estimators (Carroll et al., 2009; Clinton and Jackman, 2009). �e only di�erence is that NOMINATE assumes a scaledGaussian functional form for f , rather than a quadratic. Because the Gaussian function is locally quadratic, thesedi�erences rarely ma�er, except for extreme ideal points or proposals.

5�e α and β parameters can be expressed in terms of parameters in the spatial voting model. In particular,αk = (x1 − xk)′(x1 − xk) and βk = 2(xk − x1)′, where xk is the spatial location of the kth choice.

7

Page 8: How Much Should We Trust Ideal Point Estimates from Surveys?wpmarble/docs/MarbleTyler_SurveyIdealPointsAPSA.pdfassumption of sincere voting is commonly made in the literature, even

pick up the primary dimensions of variation in the data—whether or not they capture the theo-

retical construct of “ideology” that analysts are typically interested in. If, instead, the dimension

corresponds to geography, income, or some other feature, the θ parameters (as well as the α and

β parameters) will re�ect that.

3 Validating Ideal Point Models

If the assumptions of the spatial voting model actually hold, then the analyst should feel free to

estimate said model and interpret the results as appropriate. Of course, as the adage goes, “All

models are wrong, but some are useful.” �e question is not whether the model is literally true,

but whether it generates meaningful quantities that can explain behavior. �ere are at least two

possibilities for the model to fail to be useful. �e �rst case is when the underlying theory is

sound but the statistical model is misspeci�ed. �e second case is when the underlying theory

explains so li�le about behavior that the empirical estimates are not useful quantities.

First, the further the posited model is from the true data generating process, the less mean-

ingful statistical estimates from the model tend to be. In the case of grave misspeci�cation, the

model estimates will be nearly meaningless, because they will be poor approximations of the

actual quantities of interest.

Alternatively, the model may not be terribly misspeci�ed, but the model might not be useful

because it explains so li�le behavior. In this case, we gain essentially nothing from knowing the

true values of the model’s parameters because they imply almost nothing about behavior. For

example, if survey responses are driven almost entirely by irreducible, idiosyncratic error, then

knowing someone’s “ideal point” tells us li�le about how she would behave.

�erefore, just positing formal and statistical models is not enough. If the model is poor,

maximizing the assumed model likelihood or posterior will still lead to parameter estimates of

the best-��ing approximation to the true data-generating process (in the sense of minimizing

Kullback-Leibler divergence). And this approximation will still be constrained to the assumed

8

Page 9: How Much Should We Trust Ideal Point Estimates from Surveys?wpmarble/docs/MarbleTyler_SurveyIdealPointsAPSA.pdfassumption of sincere voting is commonly made in the literature, even

model, even if the assumed model is substantially di�erent from the true data-generating process.

�erefore, even if survey responses have nothing to do with ideal points, we will still estimate

ideal point parameters when we apply these models to survey data.

Importantly, when ��ing a model, estimates do not come with a warning or any other signal

about the quality of the approximation being made. In either the case of misspeci�cation or the

case of the theory having limited explanatory power, we are liable to over-interpret parameter

estimates if we do not validate our models.

�e quality of the approximation is especially easy to ignore when the number of parameters

being estimated is large—too many to inspect individually. Unfortunately, this creates an oppor-

tunity for the researcher to unknowingly over-interpret the results of a poor approximation. In

the case of ideal points of the mass public, if survey respondents do not actually answer survey

questions based on ideal points, then it is a mistake to interpret their ideal point estimates as

anything more than a misspeci�ed model’s best e�ort to please its creator.

Spatial voting models in particular contain a large number of parameters which are not amenable

to direct inspection.6 �e problem is further compounded by the fact that ideal points are sup-

posed to represent unobserved preferences. Naturally, unobserved information is more di�cult

to validate because we cannot directly connect it to real, observable outcomes.

Carefully evaluating the quality of parameter estimates from ideal point models requires ad-

ditional e�ort beyond what has been done in the literature so far. For example, several authors

have shown that ideal point estimates correlate with vote choice or self-reported ideology (e.g.,

Ansolabehere, Rodden and Snyder, 2008; Jessee, 2009). However, this is not su�cient evidence to

conclude that ideal point estimates are that useful at explaining or modeling survey responses.

First, it is not ambitious to expect an estimate based on 20 to 30 policy questions to have some

kind of correlation with an issue as polarizing as presidential vote choice.

Second, a correlation analysis is susceptible to outliers in the form of ideological extremists.6For example, suppose 2,000 respondents each answer 20 questions with four response options each. In this case,

a multinomial one-dimensional ideal point model will have 2, 000 ideal point parameters and 2× 3× 20 = 120 itemparameters, for a total of 2,120 parameters.

9

Page 10: How Much Should We Trust Ideal Point Estimates from Surveys?wpmarble/docs/MarbleTyler_SurveyIdealPointsAPSA.pdfassumption of sincere voting is commonly made in the literature, even

Even if 90% of survey respondents make no use of ideal points in answering questions, the re-

maining 10% can easily prop up a statistically signi�cant correlation estimate with more extreme

positions.

Finally, these external validation tests o�er li�le in the way of expectations. Suppose ideal

points actually do motivate survey responses; then how much of a correlation between the ideal

point estimates and voting behavior should we expect? If ideal points don’t explain much behav-

ior, how much should of a correlation should we expect then? Even if these seemingly important

correlations reassure us, these tests do not tell us how useful a model is, and without that we are

susceptible to over-interpreting model estimates as far more important than they actually are.

3.1 Using Cross-Validation to Evaluate Ideal Point Model Fit

In the remainder of this section, we describe a rigorous approach to validating the usefulness of

ideal point models. �e broader idea of our approach, cross-validation, is not new; indeed, it is

o�en used when validating predictive models in machine learning and statistics (for a broader

perspective, see Hastie, Tibshirani and Friedman, 2009). However, to our knowledge, it is not

yet been applied to determine whether ideal points are particularly important in accounting for

answers to survey questions.7

As described in Section 2, ideal points are supposed to capture information on correlated

preferences. Ideal point models can learn from answered questions how an individual would

respond to a hypothetical question. For example, if an individual seems to favor decreasing taxes

and reducing government intervention, then the ideal point of that individual can be used to

predict what that individual will say in response to a question about the national debt. Ideal

point models, in this sense, are predictive.

Our validation scheme takes advantage of this predictive quality. For each individual, we hide

a subset of their questions — speci�cally, 20% of them — from model estimation. �e estimator

only learns the individual’s ideal point from the remaining questions the individual answered7Tahk (2005) applies a similar approach to estimate the dimensionality of roll-call voting in Congress.

10

Page 11: How Much Should We Trust Ideal Point Estimates from Surveys?wpmarble/docs/MarbleTyler_SurveyIdealPointsAPSA.pdfassumption of sincere voting is commonly made in the literature, even

(80%). �en, we evaluate how consistent the estimated ideal point is with the questions that were

used for estimation. Speci�cally, we calculate how likely the responses to each holdout questions

are given the estimated ideal point. �at is, for each held out question, we calculate

Pr(yij = Value Observed | Model Estimates). (3)

We then aggregate across held out questions by calculating the average log likelihood of the

held-out questions:

1

N

1

J

∑i

∑j

logPr(yij), (4)

where i = 1, . . . , N indexes respondents and j = 1, . . . , J indexes held-out questions.

If ideal points do motivate survey responses and the ideal point estimator is accurate, then the

holdout answers will be highly likely. If ideal points have minimal in�uence on survey answers,

then the likelihood of the holdout answers will be lower.8

We note the reason estimation questions are not used twice is because the ideal point estima-

tor might over-�t the observed data, and thus using the twice-seen data would more than likely

lead to an overestimate of the ideal point’s ability to predict. �is problem is more and more pro-

nounced with fewer questions, because then each question by itself contributes more and more

to the ideal point estimate.

We rotate the subsets of questions used for estimation and holdout (5 rotations for 20% hid-

den), such that each question’s likelihood is evaluated exactly once and used for estimation in

all other cases. �us, for each survey answer we �nd out exactly how likely it was supposed to

occur under the ideal point model. By aggregating the holdout likelihood of each response we

ascertain quantitatively how well ideal points explain the observed data.8We also slightly depart from the literature by using likelihood as the validation metric, rather than classi�cation

accuracy. Classi�cation accuracy is not a “strict proper scoring rule,” meaning that it is not uniquely maximized bythe true model. For instance, classi�cation accuracy cannot distinguish between a model that assigns 60% probabilityto the observed response and a model that assigns 99% probability to the observed response.

11

Page 12: How Much Should We Trust Ideal Point Estimates from Surveys?wpmarble/docs/MarbleTyler_SurveyIdealPointsAPSA.pdfassumption of sincere voting is commonly made in the literature, even

3.2 Validation Strategy

We now establish expectations for the cross-validated likelihood measures described above. If

survey respondents select survey answers based on ideal points — in other words, if the ideal point

models are useful — then ideal point estimates should help explain survey behavior, above and

beyond observing simple unconditional response frequencies. In order to give much credence to

ideal point estimates, then, it is necessary that models of survey response explain more behavior

with ideal points than without.

We explicitly test this idea. We �rst �t a model of survey response with no ideal points —

only intercepts for each survey response option. �at is, individual i gives answer k for question

j with probability

P (yij = k) = πjk,

where πjk is response option intercept. �e maximum likelihood estimate of this intercept is

simply the proportion of individuals that answered this response option. We will sometimes

abuse terminology and refer to this as a “0-dimensional” model, because there are no dimensions

of political con�ict included in the model.

�en, we �t the ideal point models described in Section 2 for d = 1, 2, 3, 4, 5 dimensions. Our

primary interest is in comparing the holdout likelihood between the 0-dimensional, intercept-

only model without ideal points to the 1-dimensional model that does include ideal points. If

the ideal points are strong determinants of survey responses, then the holdout likelihood of the

model being estimated should dramatically increase when ideal points are included. If ideal points

are not determining much of survey response choices, then including ideal point parameters

will have a minimal e�ect on the explanatory power of the ideal point models. �e holdout

likelihood will not increase much at all. Analogously, if more dimensions are necessary to explain

survey responses, we would observe a corresponding increase in holdout likelihood up to the true

dimensionality.

12

Page 13: How Much Should We Trust Ideal Point Estimates from Surveys?wpmarble/docs/MarbleTyler_SurveyIdealPointsAPSA.pdfassumption of sincere voting is commonly made in the literature, even

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0 1 2 3 4

Ideal Point Dimensions

Cha

nge

in A

vera

ge L

og L

ikel

ihoo

d

● True−1D

True−0D

Figure 1: 5-fold cross-validation results for ideal point models of various assumed dimensions applied to simulateddata sets generated from a model with ideal points (“True-1D”) and a model without ideal points (“True-0D”). Seetext for details. Compared to the synthetic data generated without ideal points, the data generated with ideal pointsexhibits a surge in average holdout log likelihood with the estimation of ideal points.

In order to demonstrate our theoretical expectations, we simulated data drawn from the afore-

mentioned statistical models. One data set was drawn from an intercept-only model (essentially

drawing survey responses at random), while the other data set was simulated from a an ideal

point model of one dimension (as described in Section 2). All the parameters of Equation 2 were

drawn uniformly from [−2, 2]. In both synthetic data sets, there are N = 250 respondents each

answering 30 questions (roughly the same number of questions as in the real survey data we use

below).

With the synthetic data sets drawn from the true models, we then applied our cross-validation

procedure outlined above.9 �e results of this synthetic data exercise are summarized in Figure 1.

�is simulation con�rm our theoretical expectations. When data are drawn from a model

with ideal points, then including ideal points in the estimation signi�cantly increases the average

holdout log likelihood. When data are drawn from a model without ideal points, then introducing

ideal points has essentially no e�ect on the average holdout log likelihood. �is “di�erence-in-9Details about implementation, including priors and estimation routine, are outlined in the next section.

13

Page 14: How Much Should We Trust Ideal Point Estimates from Surveys?wpmarble/docs/MarbleTyler_SurveyIdealPointsAPSA.pdfassumption of sincere voting is commonly made in the literature, even

di�erence” design will be used in the next section to evaluate the applicability of ideal point

models to survey respondents relative to elites in Congress.

4 Data and Results

In this section, we describe the results of applying our validation exercise to ideal point models

for the mass public, in the form of the 2012 American National Election Study (ANES). Because

ideal point estimates from the public are o�en interpreted similarly to ideal point estimates taken

from Congress,10 we also apply the same evaluation criteria to roll call votes in the 111th through

114th Senates (2009-2016). If the models apply equally well to both the mass public and political

elites, then we should expect similar predictive performance results. To preview our �ndings: this

is not the case. In fact, the predictive performance of ideal points applied to the ANES looks more

similar to the “True-0D” synthetic data than the “True-1D” synthetic data of Figure 1, suggesting

that ideal point models are not as suitable for explaining survey responses as has previously been

assumed. In contrast, the results for the Senate closely mirror those of the “True-1D” synthetic

data.

4.1 Survey Data

We use the 2012 American National Election Studies data to estimate ideal point of the public.

In particular, we use the same questions as Hill and Tausanovitch (2015). �e question numbers

and o�cial labels are given in Appendix A.11

We chose to use this data set for two reasons. First, it contains questions that seem a priori

that they would map closely onto standard notions of American political ideology. For example,

question VCF0809: “�ere is much concern about the rapid rise in medical and hospital costs.

Some people feel there should be a government insurance plan which would cover all medical and10Indeed, leading applications of ideal point estimation of the public involve “joint scaling” of the public and their

representatives. E.g., Jessee (2009); Bafumi and Herron (2010); Gerber and Lewis (2004); Saiegh (2015).11ANES Data and questions were obtained from electionstudies.org.

14

Page 15: How Much Should We Trust Ideal Point Estimates from Surveys?wpmarble/docs/MarbleTyler_SurveyIdealPointsAPSA.pdfassumption of sincere voting is commonly made in the literature, even

hospital expenses for everyone. Others feel that medical expenses should be paid by individuals,

and through private insurance plans like Blue Cross. . . ” Or question VCF0839: “Some people

think the government should provide fewer services, even in areas such as health and education,

in order to reduce spending. Other people feel that it is important for the government to provide

many more services even if it means an increase in spending. . . ” Many of the 28 questions touch

on a similar theme of government intervention in the economy. Others, such as VCF0878, ask the

respondent to place themselves on divisive social issues like abortion. If respondents’ preferences

are tightly constrained as the spatial voting model assumes, the answers to these questions should

be highly correlated.

Second, basing our validation exercise on a leading application of ideal point models to the

mass public ensures that our results re�ect how such models are typically used.

4.2 Analysis Plan

We perform 5-fold cross-validation using models of d = 0, 1, 2, 3, 4, 5 dimensions. Holdout sam-

ples are constructed with strati�ed sampling, so that each individual always has at least 9 ques-

tions for the model to use in estimation (the 8 individuals who answered fewer than 10 questions

were removed). In reality, the vast majority of respondents answered more than twenty questions

— an amount which any useful ideal point model should be capable of exploiting.12 In total, the

data include over 2,000 respondents answering 28 questions. Most questions include more than

two response options. �erefore, following the exposition in Section 2, we estimate a multino-

mial logistic ideal point model. However, the results we present below are substantively identical

if we �rst collapse all questions to a binary scale and implement a traditional logistic ideal point

model.

To evaluate ideal point model performance for political elites, we apply the cross validation

procedure to roll-call votes from the 111th through 114th Senates, imposing static ideal points12Later, we use regression to investigate whether the number of questions answered played a role in determin-

ing predictive performance. We �nd that in a bivariate analysis there is a trivial improvement, but the correlationdisappears a�er controlling for other individual-level characteristics.

15

Page 16: How Much Should We Trust Ideal Point Estimates from Surveys?wpmarble/docs/MarbleTyler_SurveyIdealPointsAPSA.pdfassumption of sincere voting is commonly made in the literature, even

within Senators across time.13 As with ANES respondents, we perform 5-fold cross-validation

using models of d = 0, 1, 2, 3, 4, 5 dimensions. Holdout samples were strati�ed by senator to

ensure that each model was being estimated with comparable amounts of voting data for each

senator.

4.3 Estimation Procedure

For both data sets, models were estimated using automatic variational inference (ADVI) in the

probabilistic programming language Stan (Kucukelbir et al., 2015). �e size and complexity of

the models being estimated practically demands use of approximate inference methods such as

ADVI; whereas Markov chain Monte Carlo (MCMC) sampling methods would be computationally

prohibitive with our cross-validation plan. Fortunately, fast estimation of ideal points through

variational inference has been shown to be reliable and highly consistent with results obtained

from MCMC (Imai, Lo and Olmsted, 2015).14

To identify the models, we constrain the ideal points of d+ 1 respondents (Rivers, 2003). We

place Normal(0, 1) priors on the ideal points and Normal(0, 52) priors on the item parameters.

Results are not sensitive to the choice of prior.

4.4 Main Results

�e main results from our cross-validation exercise are presented in Figure 2. �e key result in

this �gure is that the improvement from a 0-dimensional (non-ideological) model for the Senate

is dramatic — an increase of about 0.25 in the average log likelihood. While on the other hand, the

same introduction of ideal points for the mass public produces a nearly trivial e�ect — an increase

of about 0.07.15 �e relative improvement strongly favors the idea that the senators’ ideal points13Roll-call votes were compiled by Keith Poole, Howard Rosenthal, and Je�rey Lewis, and were downloaded from

http://voteview.com in April 2016. �ey were processed using Simon Jackman’s pscl R package.14To further verify that ADVI gives similar results to MCMC in estimating ideal points models, we �rst replicated

the �ndings of Hill and Tausanovitch (2015), which were collected using MCMC. �e ADVI results are indistinguish-able from those obtained using MCMC.

15Results using the average likelihood directly instead of the log likelihood are almost exactly the same.

16

Page 17: How Much Should We Trust Ideal Point Estimates from Surveys?wpmarble/docs/MarbleTyler_SurveyIdealPointsAPSA.pdfassumption of sincere voting is commonly made in the literature, even

0.00

0.05

0.10

0.15

0.20

0.25

0 1 2 3 4

Ideal Point Dimensions

Cha

nge

in A

vera

ge L

og L

ikel

ihoo

d

● Senate

ANES

Figure 2: Five-fold cross-validation results for ideal point models of dimensions d = 1, . . . , 4 applied to the 2012ANES survey data questions and the 111-114th Senate roll-call votes. Roll-call votes appear much more likely a�er theintroduction of ideal points, where as modeling ideal points has minimal e�ects on the likelihood of ANES responses.Estimates are very precise, so standard error bars are suppressed for the sake of visual presentation. Results for a5-dimensional model are not shown, but are generally worse than the 1-dimensional model.

contain more information about the actual behavior being studied than the mass public’s ideal

points.

�e introduction of ideal points simply explains far more of voting behavior in the Senate

than it does survey responses on the ANES. While this is not entirely unexpected, the sheer

magnitude of this “di�erence-in-di�erences” suggests more than just the fact that these are two

di�erent groups of people performing di�erent activities. Recalling Figure 1, the behavior of

ANES respondents is much more similar to the “True-0D” synthetic data where survey responses

were more or less given at random. �us, our cross-validation results suggest that survey respon-

dents behave much more like they are randomly answering survey questions than they are being

guided by a spatial choice model. If their behavior is inspired by ideal points, then there is over-

whelmingly more “noise” in their responses than has been appreciated. In contrast, the Senate

voting behavior is much more similar to the “True-1D” synthetic data—a result that is unsurpris-

ing given the highly structured nature of legislative politics, and the long history of ideal point

models in Congress (Poole and Rosenthal, 1997, 1985). Ideal point models are overwhelmingly

17

Page 18: How Much Should We Trust Ideal Point Estimates from Surveys?wpmarble/docs/MarbleTyler_SurveyIdealPointsAPSA.pdfassumption of sincere voting is commonly made in the literature, even

more applicable to political elites than the mass public.

It is worth noting that in both groups that there is only a trivial amount of improvement when

advancing the number of dimensions beyond one. For example, adding a second policy dimension

(say, to account for both economic and social preferences) o�ers only trivial improvement in the

likelihood of responses in either study population. Moreover, not shown here is a ��h dimension

that was also estimated: the likelihood results for a ��h dimension performed worse than any

of the other nonzero dimensions, and in some folds the performance degraded below a zero-

dimensional model. �is can occur because the model can over�t intricacies in training data that

are not present in the holdout data for that particular fold.

�is suggests that if there are indeed multiple issue dimensions at play in explaining political

preferences,16 simple extensions of the standard ideal point model are unsuited to estimate them.

4.5 Further Analysis

To explore our results further, we examine which ANES respondents bene�t most in a predictive

sense from a 1-dimensional model compared to a 0-dimensional model. At the individual level,

we take the per-question average of their holdout likelihoods17 for both zero and one dimensions.

We then take the di�erence between the models to estimate which people bene�ted the most, in

a predictive sense, from the inclusion of ideal points in the survey response model. Formally, the

statistic that we use to gauge the improvement for respondent i is

Improvementi =1

J

J∑j=1

[Pr(yij = Value Observed | One Dimensional Model)−

Pr(yij = Value Observed | Zero Dimensional Model)]

In this case, a positive statistic means that an ideal point model improved that respondent’s av-

erage likelihood over the zero-dimensional, non-ideal-point model. We regress (with ordinary16For example, as discussed by Broockman (2016) and Ansolabehere, Rodden and Snyder (2008).17For example, if the probability of someone’s responses are 50, 55, and 60, then their average likelihood is 55.

18

Page 19: How Much Should We Trust Ideal Point Estimates from Surveys?wpmarble/docs/MarbleTyler_SurveyIdealPointsAPSA.pdfassumption of sincere voting is commonly made in the literature, even

●Number questions

answered

Hispanic

Asian

Black non−Hispanic

Family incomegroup

Education

Political info

Female

Age

Party (7)

−0.01 0.00 0.01 0.02 0.03Regression Coefficient

●● Bivariate

Multivariate

Figure 3: Correlates of improved predictive performance. �e unit of observation is the respondent, and thedependent variable is the di�erence in average out-of-sample likelihood between the one-dimensional and zero-dimensional model. Political information is the ANES interviewer impression of respondents’ political knowledge;education is a 4-category ordinal coding; family income group is a 5-category ordinal coding; for race, the referencecategory is white respondents. Lines show robust 90% con�dence intervals.

least squares) numerous ANES 2012 individual covariates on this di�erence, and present the re-

sults in Figure 3. Overall, it does not appear that any covariate is associated with a large gain from

the ideal point model. Even the coe�cients that are statistically signi�cant are relatively small

in magnitude in both bivariate and multivariate estimation. �us, we hesitate to over-interpret

these results.

We also consider who bene�ts from the addition of ideal points according to estimated ideal

point (estimated using the full sample, not during cross-validation). We should expect there to

be performance improvement among those who are most extreme on either end of the spectrum,

since such extreme respondents give the model its easiest task for prediction. For example, there

is usually less doubt about what an extreme conservative would respond with compared to trying

to predict what a moderate would say. Figure 4 con�rms this intuition with its sharp U-shape,

implying the value of an ideal point model is proportional to the magnitude of one’s estimated

ideal point. We note, however, that the vast majority of respondents are estimated to have moder-

ate ideal point—meaning for that most respondents, the ideal-point model does not substantially

19

Page 20: How Much Should We Trust Ideal Point Estimates from Surveys?wpmarble/docs/MarbleTyler_SurveyIdealPointsAPSA.pdfassumption of sincere voting is commonly made in the literature, even

0.0

0.1

0.2

0.3

−3 −2 −1 0 1 2 3Estimated ideal point

Incr

ease

in li

kelih

ood

from

1D

mod

el

Figure 4: A plot of change in average likelihood against (posterior mean) estimated ideal points from the 2012 ANES.Only those with relatively extreme ideal points see much change in average likelihood.

increase predictive power.

Finally, we note that most ideal points are clustered around zero. Figure 5 plots the distribution

of estimated ideal points for 2012 ANES respondents. While this information is also contained

in Figure 4, it deserves highlighting on its own. As we will discuss in the next section, ideal

points which are near zero suggest that the respondent’s behavior is dominated by the model

intercept, rather than their ideal point. If nearly all respondents have ideal points near zero, then

this suggests that whatever ideal point dimension is being uncovered is being dominated by a

relatively small subset of the survey population.

In summary, we �nd evidence that survey responses from the mass public are poorly explained

by the use of ideal points. Survey response behavior looks much more similar to data generated

from a non-ideal-point model. In contrast, data from political elites is much more amenable to

ideal point modeling. �is suggests that it is a poor idea to assume that ideal points gleaned from

surveys of the mass public are comparable to elite ideal points. If ideal points being obtained from

surveys do not actually explain much survey behavior, then what are they? Our brief analysis here

suggests that there are no obvious correlates to ideal point predictive power other than political

20

Page 21: How Much Should We Trust Ideal Point Estimates from Surveys?wpmarble/docs/MarbleTyler_SurveyIdealPointsAPSA.pdfassumption of sincere voting is commonly made in the literature, even

0

50

100

150

200

−3 −2 −1 0 1 2 3

Estimated Ideal Point

Fre

quen

cy

ANES 2012

Figure 5: A histogram of (posterior mean) estimated ideal points from the 2012 ANES. Most values are clusteredaround zero.

extremism. But most ideal point respondents have estimated ideal points near zero, making their

survey responses more or less independent across questions.

5 Discussion and Implications

We �nd li�le evidence that the standard method of ideal point estimation can adequately summa-

rize survey responses in the mass public. A statistical model that omits individuals’ ideal points

performs about as well in predicting out-of-sample responses as a statistical model that includes

an ideal point with 1, 2, 3, or 4 dimensions. �is �nding implies that a person’s responses to one

set of survey questions are not very informative about their responses to additional questions,

given the ideal point model we estimate. As a result, the estimated low-dimensional ideal point

is not particularly informative about any given political opinion.

Our main result suggests that researchers should use caution in interpreting the results of

standard ideal point models applied to the public. When evaluated on their own terms, these

models do not appear to perform particularly well, so conclusions based on their application may

21

Page 22: How Much Should We Trust Ideal Point Estimates from Surveys?wpmarble/docs/MarbleTyler_SurveyIdealPointsAPSA.pdfassumption of sincere voting is commonly made in the literature, even

be misleading. �ough this conclusion does not depend on the source of the poor performance,

it is still worthwhile to assess potential explanations for these �ndings.

One class of explanations is methodological in nature. It may be that the underlying spatial

voting model is a useful approximation of reality, but that problems with either the data or es-

timation procedure prevent us from obtaining valid estimates of the theoretical construct of an

“ideal point.” Another class of explanations is substantive in nature, re�ecting concerns outlined

in the political behavior literature on non-a�itudes and “ideological innocence” (Converse, 1964).

In this case, the problem is not the data or the estimation procedure, but a fundamental incongru-

ence between the actual structure of public opinion and the assumptions underlying the spatial

voting model.

In this section, we consider what we can learn about these potential explanations. �rough-

out, we refer back to Section 2 to �x ideas about how each explanation would violate assumptions

of either the (formal) spatial voting model or the (statistical) ideal point model. We cannot com-

pletely adjudicate between these explanations—and they are not all mutually exclusive—because

several have observationally equivalent implications in the context of the standard ideal point

model. Nonetheless, we comment on the consistency of our evidence with each.

5.1 Methodological Explanations

First, the statistical model could be performing poorly due to the limited number of questions

each respondent answers. It is plausible that as an individual answers more questions, the model

is able to learn her ideal point with more precision—which would lead to increased performance.

Conversely, if we estimate the model with few questions (in the application above, there are 29

questions), there might not be su�cient information for the model to learn people’s ideal points.

�is problem would be exacerbated if the true structure were multidimensional, because there

are more parameters to learn. If this mechanism is at work, the statistical model could perform

poorly even if the assumptions of the model are satis�ed.

Above, we showed that the number of questions a respondent answered correlates only weakly

22

Page 23: How Much Should We Trust Ideal Point Estimates from Surveys?wpmarble/docs/MarbleTyler_SurveyIdealPointsAPSA.pdfassumption of sincere voting is commonly made in the literature, even

with the out-of-sample predictive performance of ideal point models, relative to the intercept-

only model. �is is some preliminary evidence that increasing the number of items in the model

will not dramatically improve its performance. However, in ideal point models the estimates of all

the parameters—including the question-level di�culty and discrimination parameters—depend

on all the other parameters. �us, it is possible that if all respondents had answered more ques-

tions, the model would perform be�er due to the increased information in the item parameter

estimates as well.

We note that we modeled our validation analysis on a typical example in the literature, and

used a fairly large number of questions that a priori seem that they would contain a great deal of

information about citizens’ ideology. If the model would generate more valid estimates with more

questions, it would require researchers to go beyond what is standard practice in the literature.

A second methodological explanation has to do with the inadequacy of surveys for captur-

ing political opinion. Surveys provide few incentives or opportunities to carefully consider the

options before responding and are thus prone to noisy responses.

In terms of the assumptions presented in Section 2, surveys’ inadequacy could manifest in

violations of the sincere voting assumption or an in�ated variance of the idiosyncratic portion

of the utility function. �e �rst case would occur if survey respondents are mostly satis�cing,

rather than answering questions honestly. However, our evaluation uses data from the ANES, a

high-quality face-to-face survey. As such, we should expect relatively li�le satis�cing compared

to telephone or internet surveys (Holbrook, Green and Krosnick, 2003), making this explanation

for the �ndings unlikely.

�e second case would occur if some or most respondents are more likely to answer according

to their spatial preferences, but are highly imprecise. In the spatial voting model, this can be

represented as a high variance of ν. �eoretically, aggregating multiple measures together should

reduce the importance of random noise in survey responses (Achen, 1975; Ansolabehere, Rodden

and Snyder, 2008). �e entire point of the ideal point model is to extract the underlying spatial

preference from the idiosyncratic component of the utility function. In the notation of Section 2,

23

Page 24: How Much Should We Trust Ideal Point Estimates from Surveys?wpmarble/docs/MarbleTyler_SurveyIdealPointsAPSA.pdfassumption of sincere voting is commonly made in the literature, even

we are trying to disentangle θ from ν.

Still, it is possible that the variance of the noise in survey data is much larger than the variance

of the noise in other types of data that might reveal preferences. For instances, Gerber and Lewis

(2004) apply an ideal point model to ballot image data from Los Angeles County. Perhaps in this

situation, it is easier to extract meaningful information from the data due to the increased stakes

associated with a real election compared to a survey. However, if the variance of the noise is so

large that a zero-dimensional model �ts about as well, it is unclear in what sense it can still be

said that ideal points motivate survey response.

5.2 Substantive Explanations

�e second class of explanations focuses on the underlying structure of public opinion. �ere are

several possibilities that could produce the poor performance of the ideal point models that we

observe.

First—and most obviously—the public may simply not have policy preferences that can be

reduced to a low-dimensional representation. In this scenario, survey responses may represent

genuine political opinions (at least on average), but most citizens do not have a sense of “what

goes with what” (Converse, 1964). �at is to say, citizens may simply have idiosyncratic, but real,

preferences across issues that cannot be reduced (see, e.g., Ahler and Broockman, 2016).

One way to conceptualize this possibility in terms of the spatial voting model is to let the

dimensionality of the policy space equal to the number of issues. If this were the case, the spa-

tial voting model would not necessarily be any more wrong than if the policy space had a low

dimensionality, but it would be substantially less useful. If there is a “real” policy space, but

each question has its own dimension that is orthogonal to the other dimensions, the ideal point

estimates would not tell us any more or less than the raw survey data.

A similar explanation is simply that the public does not have opinions on all (or most) issues.

In this telling, survey responses re�ect �eeting whims rather than real preferences. �is possi-

bility is consistent with a large body of research in political behavior (e.g., Kuklinski and �irk,

24

Page 25: How Much Should We Trust Ideal Point Estimates from Surveys?wpmarble/docs/MarbleTyler_SurveyIdealPointsAPSA.pdfassumption of sincere voting is commonly made in the literature, even

2000; Zaller and Feldman, 1992; Bartels, 1996).18

A �nal possibility is that voters do have genuine preferences that can be usefully approxi-

mated by a low-dimensional model, but that most voters tend to be moderate, with ideal points

close to 0. �is scenario would imply that an intercept-only model would perform just as well as

a model that incorporates ideal points.

To see this, consider that the probability of observing voter i give answer k on question j in

the intercept-only model is

Pr(yij = k) =exp(αjk)∑Kj

l=1 exp(αjl),

while the probability in the ideal point model is

Pr(yij = k) =exp(αjk + βjkθi)∑Kj

l=1 exp(αjl + βjlθi)).

If the ideal point θi ≈ 0, then the la�er reduces to the former, and the out-of-sample predictive

performance should be similar. Where the model makes incorrect predictions, it is simply due to

irreducible error rather than a a violation of any assumptions.

Our view is that ideal point models are ill-suited to distinguish between these three explana-

tions, at least without substantial modi�cations. All three substantive explanations—uncorrelated

opinions, non-a�itudes, and widespread moderates—imply observationally equivalent predic-

tions about out-of-sample performance in the standard cross-sectional se�ing.19

6 Concluding Remarks

For the study of representation, ideal point models o�er some of the clearest predictions on a

relatively simple mathematical platform. If the two main actors involved, the people and their18Of course, there is far from a consensus on this point. See Sniderman and Stiglitz (2012); Lupia (1994); Bullock

(2011); Boudreau and MacKenzie (2014); Tomz and Van Houweling (2008) for examples of research contending thatvoters can map their preferences onto policies in at least some domains.

19Our future research will include results exploiting panel studies to a�empt to disentangle some of these expla-nations.

25

Page 26: How Much Should We Trust Ideal Point Estimates from Surveys?wpmarble/docs/MarbleTyler_SurveyIdealPointsAPSA.pdfassumption of sincere voting is commonly made in the literature, even

representatives, could be put on the same quantitative scale, then the study of the democracy

could be reduced to a simple ma�er of geometry: how close are the policy preferences of repre-

sentatives to those they represent?

Unfortunately, our results suggest that the promises of this research paradigm are not eas-

ily met. �e statistical models involved are unwieldy and their target estimands unobservable,

making model validation particularly thorny. In this paper, we present a principled validation

technique based on the theoretical model of survey responses that underlies ideal point models.

We estimate the out-of-sample predictive performance associated with ideal point models and

comparing it to a simpler model that does not include ideal points. �is procedure allows us to

assess the extent to which ideal points can “explain” survey responses.

By this measure, ideal points estimated from survey responses are not even approximately

comparable to ideal points obtained from analyzing roll-call votes. Elite behavior, in the form of

roll-call voting by senators, is clearly consistent with the predictions of a one-dimensional ideal

point model, and the statistical ideal point models are e�ective in learning these ideal points.

Survey responses, on the other hand, are not explained well by ideal point model estimates. Any

given ideal point estimate has li�le to do with the questions used to estimate it. Consequently,

it is a mistake to treat the ideal points of political elites similarly to those estimated for the mass

public.

Whether this disconnect is due to surveys being a poor medium for evaluating spatial politics,

citizens being unmotivated by spatial politics, or the lack of variation in a�itudes in the public,

we cannot say. For standard ideal point models, all three of these causes are observationally

equivalent. Future work will be responsible for �nding new data or statistical techniques that

can adjudicate between these hypotheses.

26

Page 27: How Much Should We Trust Ideal Point Estimates from Surveys?wpmarble/docs/MarbleTyler_SurveyIdealPointsAPSA.pdfassumption of sincere voting is commonly made in the literature, even

ReferencesAchen, Christopher H. 1975. “Mass Political A�itudes and the Survey Response.” American Polit-ical Science Review 69(4):1218–1231.

Ahler, Douglas J. and David E. Broockman. 2016. “Does Elite Polarization Imply Poor Represen-tation? A New Perspective on the “Disconnect” Between Politicians and Voters.”.

Ansolabehere, Stephen, Jonathan Rodden and James M. Snyder. 2008. “�e Strength of Issues: Us-ing Multiple Measures to Gauge Preference Stability, Ideological Constraint, and Issue Voting.”American Political Science Review 102(02):215–232.

Austen-Smith, David and Je�rey S. Banks. 1996. “Information Aggregation, Rationality, and theCondorcet Jury �eorem.” American Political Science Review 90(1):34–45.

Bafumi, Joseph and Michael C. Herron. 2010. “Leapfrog Representation and Extremism: AStudy of American Voters and �eir Members in Congress.” American Political Science Review104(03):519–542.URL: h�p://www.journals.cambridge.org/abstract S0003055410000316

Bartels, Larry M. 1996. “Uninformed Votes: Information E�ects in Presidential Elections.” Amer-ican Journal of Political Science 40(1):194–230.

Bartels, Larry M. 2003. Democracy With A�itudes. In Electoral Democracy, ed. Michael MacKuenand George Rabinowitz. Ann Arbor: University of Michigan Press.

Black, Duncan. 1948. “On the Rationale of Group Decision-Making.” Journal of Political Economy56(1):23–34.

Boudreau, Cheryl and Sco� A. MacKenzie. 2014. “Informing the Electorate? How Party Cuesand Policy Information A�ect Public Opinion About Initiatives.” American Journal of PoliticalScience 58(1):48–62.

Broockman, David E. 2016. “Approaches to Studying Policy Representation.” Legislative Studies�arterly 41(1):181–215.

Bullock, John G. 2011. “Elite In�uence on Public Opinion in an Informed Electorate.” AmericanPolitical Science Review 105(3):496–515.

Campbell, Angus, Phillip Converse, Warren Miller and Donald Stokes. 1960. �e American Voter.Chicago: Chicago University Press.

Carroll, Royce, Je�rey B. Lewis, James Lo, Keith T. Poole and Howard Rosenthal. 2009. “Compar-ing NOMINATE and IDEAL: Points of Di�erence and Monte Carlo Tests.” Legislative Studies�arterly XXXIV(4):555–591.

Clinton, Joshua D. 2012. “Using Roll Call Estimates to Test Models of Politics.” Annual Review ofPolitical Science 15(1):79–99.

27

Page 28: How Much Should We Trust Ideal Point Estimates from Surveys?wpmarble/docs/MarbleTyler_SurveyIdealPointsAPSA.pdfassumption of sincere voting is commonly made in the literature, even

Clinton, Joshua D. and Adam Meirowitz. 2004. “Testing Explanations of Strategic Voting in Leg-islatures: A Reexamination of the Compromise of 1790.” American Journal of Political Science48(4):675–689.

Clinton, Joshua D. and Simon Jackman. 2009. “To Simulate or NOMINATE?” Legislative Studies�arterly XXXIV(4):593–621.URL: h�p://www.ingentaconnect.com/content/uoi/lsq/2009/00000034/00000004/art00006

Clinton, Joshua, Simon Jackman and Douglas Rivers. 2004. “�e Statistical Analysis of Roll CallData.” American Political Science Review 98(2):355–370.

Converse, Philip E. 1964. �e Nature of Belief Systems in Mass Publics. In Ideology and Discontent,ed. David Apter. New York: �e Free Press pp. 206–261.

Downs, Anthony. 1957. An Economic�eory of Democracy. New York: Columbia University Press.

Freeder, Sean, Gabriel S. Lenz and Shad Turney. 2016. “�e Importance of Knowing ‘What GoesWith What’.”.

Gerber, Elisabeth R. and Je�rey B. Lewis. 2004. “Beyond the Median: Voter Preferences, DistrictHeterogeneity, and Political Representation.” Journal of Political Economy 112(6):1364–1383.

Hastie, Trevor, Robert Tibshirani and Jerome Friedman. 2009. Elements of Statistical Learning:Data Mining, Inference, and Prediction. 2 ed. New York: Springer.URL: h�p://www.springerlink.com/index/D7X7KX6772HQ2135.pdf

Hill, Seth J. and Chris Tausanovitch. 2015. “A Disconnect in Representation? Comparison ofTrends in Congressional and Public Polarization.” �e Journal of Politics 77(4):1058–1075.

Holbrook, Allyson L., Melanie C. Green and Jon A. Krosnick. 2003. “Telephone versus Face-to-Face Interviewing of National Probability Samples with Long �estionnaires: Comparisons ofRespondent Satis�cing and Social Desirability Response Bias.” Public Opinion�arterly 67(1):p79–125.URL: h�p://www.jstor.org/stable/3521667

Imai, Kosuke, James Lo and Jonathan Olmsted. 2015. “Fast Estimation of Ideal Points with MassiveData.” American Political Science Review 110(4):1–20.

Jessee, Stephen A. 2009. “Spatial Voting in the 2004 Presidential Election.” American PoliticalScience Review 103(01):59.

Kinder, Donald R. 2003. Belief Systems A�er Converse. In Electoral Democracy, ed. Michael MacK-uen and George Rabinowitz. Ann Arbor: University of Michigan Press.

Kucukelbir, Alp, Rajesh Ranganath, Andrew Gelman and David M. Blei. 2015. Automatic Varia-tional Inference in Stan. In Neural Information Processing Systems.

Kuklinski, James H. and Paul J. �irk. 2000. Reconsidering the Rational Public: Cognition, Heuris-tics, and Mass Opinion. In Elements of Reason: Cognition, Choice, and the Bounds of Rationality,ed. Arthur Lupia, Ma�hew D. McCubbins and Samuel L. Popkin. Cambridge University Press.

28

Page 29: How Much Should We Trust Ideal Point Estimates from Surveys?wpmarble/docs/MarbleTyler_SurveyIdealPointsAPSA.pdfassumption of sincere voting is commonly made in the literature, even

Lupia, Arthur. 1994. “Shortcuts versus Encyclopedias: Information and Voting Behavior in Cali-fornia Insurance Reform Elections.” American Political Science Review 88(1):63–76.

Pan, Jennifer and Yiqing Xu. 2017. “China’s Ideological Spectrum.” Journal of Politics .

Poole, Keith T. and Howard Rosenthal. 1985. “A Spatial Model for Legislative Roll Call Analysis.”American Journal of Political Science 29(2):357–384.

Poole, Keith T. and Howard Rosenthal. 1997. Congress: A Political-Economic History of Roll CallVoting. Oxford University Press.

Rivers, Douglas. 2003. “Identi�cation of Multidimensional Spatial Voting Models.”.

Saiegh, Sebastian M. 2015. “Using joint scaling methods to study ideology and representation:Evidence from Latin America.” Political Analysis 23(3):363–384.

Sniderman, Paul M. and Edward H. Stiglitz. 2012. �e Reputational Premium: A �eory of PartyIdenti�cation and Policy Reasoning. Princeton: Princeton University Press.

Tahk, Alexander. 2005. “�e Signals in the Noise: A Cross-Validation Approach to Estimatingthe Dimensionality of Roll Call Voting.” Paper presented at the annual meeting of the AmericanPolitical Science Association .

Tausanovitch, Chris and Christopher Warshaw. 2013. “Measuring Constituent Policy Preferencesin Congress, State Legislatures, and Cities.” Journal of Politics 75(02):330–342.URL: h�p://www.jstor.org/stable/10.1017/S0022381613000042

Tausanovitch, Chris and Christopher Warshaw. 2014. “Representation in Municipal Government.”American Political Science Review 108(03):605–641.URL: h�p://journals.cambridge.org/abstract S0003055414000318

Tomz, Michael and Robert P. Van Houweling. 2008. “Candidate Positioning and Voter Choice.”American Political Science Review 102(3):303–318.

Treier, S. and D. S. Hillygus. 2009. “�e Nature of Political Ideology in the Contemporary Elec-torate.” Public Opinion �arterly 73(4):679–703.URL: h�p://poq.oxfordjournals.org/cgi/doi/10.1093/poq/nfp067

Warshaw, Christopher and Jonathan Rodden. 2009. “Measuring District-Level Economic andMoral Ideology.”.URL: h�p://www.stanford.edu/ jrodden/wp/warshaw rodden dec09.pdf

Zaller, John and Stanley Feldman. 1992. “A Simple �eory of the Survey Response: Answering�estions versus Revealing Preferences.” American Journal of Political Science 36(3):579–616.

29

Page 30: How Much Should We Trust Ideal Point Estimates from Surveys?wpmarble/docs/MarbleTyler_SurveyIdealPointsAPSA.pdfassumption of sincere voting is commonly made in the literature, even

A Table of Used 2012 ANES�estions

�estion Number �estion DescriptionVCF0806 R Placement: Government Health Insurance ScaleVCF0809 R Placement: Guaranteed Jobs and Income ScaleVCF0823 R Opinion: Be�er o� if U.S. Unconcerned with Rest of WorldVCF0830 R Placement: Aid to Blacks ScaleVCF0838 R Opinion: By Law, When Should Abortion Be AllowedVCF0839 R Placement: Government Services/Spending ScaleVCF0843 R Placement: Defense Spending ScaleVCF0867a R Opinion: A�rmative Action in Hiring/Promotion [2 of 2]VCF0876a R Opinion Strength: Law Against Homosexual DiscriminationVCF0877a R Opinion Strength: Favor/Oppose Gays in MilitaryVCF0878 R Opinion: Should Gays/Lesbians Be Able to Adopt ChildrenVCF0879a R Opinion: U.S. Immigrants Should Increase/Decrease [2 of 2]VCF0886 R Opinion: Federal Spending- Poor/Poor PeopleVCF0887 R Opinion: Federal Spending- Child CareVCF0888 R Opinion: Federal Spending- Dealing with CrimeVCF0889 R Opinion: Federal Spending- Aids Research/Fight AidsVCF0894 R Opinion: Federal Spending- Welfare ProgramsVCF9013 R Opinion: Society Ensure Equal Opportunity to SucceedVCF9014 R Opinion: We Have Gone Too Far Pushing Equal RightsVCF9015 R Opinion: Big Problem that Not Everyone Has Equal ChanceVCF9037 R Opinion: Government Ensure Fair Jobs for BlacksVCF9040 Blacks Should Not Have Special Favors to SucceedVCF9047 R Opinion: Federal Spending- Improve/Protect EnvironmentVCF9048 R Opinion: Federal Spending- Space/Science/TechnologyVCF9049 R Opinion: Federal Spending- Social SecurityVCF9131 R Opinion: Less Government Be�er OR Government Do MoreVCF9132 R Opinion: Govt Handle Economy OR Free Market Can HandleVCF9133 R Opinion: Govt Too Involved in �ings OR Problems Require

30