visual design effects on respondents behavior in web surveys

Visual Design Effects on Respondents’Behavior in Web-Surveys.

An Experimental Comparison of Different Input Controls for ScalarQuestions and Visual Analog Scales, and a Software Solution.

Dissertation

zur Erlangung des akademischen Grades

eines Doktors der Sozial- und Wirtschaftswissenschaften

an der Fakultät für Politikwissenschaft und Soziologie

der Universität Innsbruck

eingereicht von

DI. Mag. Albert Greinöcker

Erstbegutachter: Prof. Dr. Gilg Seeber

Zweitbegutachter: Prof. Dr. Martin Weichbold

Innsbruck, Juni 2009

Abstract

The field of visual design effects on respondent’s behavior in online surveys is well researched;however in several cases the outcomes of different studies have provided contradictory findings.In this thesis, the focus will be on experiments dealing mainly with Visual Analogue Scales(VAS) and ratings scales. A VAS is an instrument that tries to measure a characteristic orattitude that is believed to range across a continuum of values and is verbally anchored at eachend (e.g. strongly agree vs. strongly disagree as such anchors). In survey research, the use ofVAS is relatively rare, in part because of operational difficulties (Couper et al. (2006)). Hencea detailed view on technical possibilities and pitfalls should be given.

Three main studies with the same experimental design (ensuring that occurring effects werereproducible) were carried out, whereby 6 different types of such scales were presented to theinterviewees in order to measure the effect of varying appearance and functionality of the controlsused for implementing the scales. To run these experiments, software was developed that focusedon a good support system for Web survey experimenting. The results refer to the general fillout behavior, completion time, dropout, reliability and usage of extreme points.

2

Acknowledgements

First, I would like to express my most sincere gratitude towards my academic advisor, ProfessorGilg Seeber, for his continuous support throughout the Ph. D program. I deeply appreciate hisconstructive advice and input, his time, and ongoing patience. The regular discussions with himhave definitely benefited me and have greatly motivated me to move forward during the periodof dissertation-writing. I would also like to thank Professor Martin Weichbold from SalzburgUniversity for his methodological input and constructive critics.

Furthermore I would like to thank Professor Herman Denz(†), Professor Brigitte Mayer andEgon Niederacher from the University of Applied Sciences Vorarlberg, who supported the devel-opment and dissemination of the software.

In addition I would like to thank Mag.ra Annabell Marinell for proofreading of the whole work.

Last but not least I would like to thank the Innsbruck University IT-Service center for givingthe infrastructure for the experiments.

Affirmation

Hereby I affirm that I wrote the present thesis without any inadmissible help by a third partyand without using any other means than indicated. Thoughts that were taken directly or indi-rectly from other sources are indicated as such. This thesis has not been presented to any otherexamination board in this or a similar form, neither in Austria nor in any other country.

———————————————–Albert Greinöcker

3

Contents

1 Introduction 9

2 Terminology 122.1 Online Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.1.1 Operating Fields and Organizations . . . . . . . . . . . . . . . . . . . . . 14

I Current Literature 15

3 Introduction 16

4 Effects on Visual Analog Scales, Slider Scales, Categorical Scales 174.1 Completion Rate/Breakoffs and Missing Data . . . . . . . . . . . . . . . . . . . . 194.2 Different Response Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204.3 Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.4 Categorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.5 Use of Midpoint and Extremes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.6 Actual Position Feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.7 Spacing, Positioning, Disproportion, Shading, Labelling . . . . . . . . . . . . . . 234.8 Response Time / Completion Time . . . . . . . . . . . . . . . . . . . . . . . . . . 234.9 Feedback Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

5 General Visual Effects in Online Questionnaires 255.1 Paging versus Scrolling (All-in-one or Screen-by-screen) . . . . . . . . . . . . . . 255.2 Fancy vs. Plain Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265.3 Different Alternatives’ Representation in Closedended Questions . . . . . . . . . . 26

5.3.1 Check-all-that-apply vs. Forced-choice . . . . . . . . . . . . . . . . . . . . 265.3.2 Grouping of Alternatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275.3.3 Double or Triple Banking . . . . . . . . . . . . . . . . . . . . . . . . . . . 295.3.4 Response Order Manipulations . . . . . . . . . . . . . . . . . . . . . . . . 295.3.5 Different HTML-Controls Used . . . . . . . . . . . . . . . . . . . . . . . . 30

5.4 Color Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315.5 Modifications on Input Fields for Openended Questions . . . . . . . . . . . . . . 32

5.5.1 Different Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325.5.2 Lines in Answer Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

5.6 Influence of Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335.6.1 Personalization - Virtual Interviewer Effects . . . . . . . . . . . . . . . . . 34

5.7 Heaping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355.8 Additional Content and Visual Hints . . . . . . . . . . . . . . . . . . . . . . . . . 35

5.8.1 Placement of Instructions and Help Texts . . . . . . . . . . . . . . . . . . 355.9 Date Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4

Contents

5.10 Progress Bar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375.11 Sponsor’s Logo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

6 Non-Visual Design Experiments in Online Surveys 406.1 Incentives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 406.2 Invitation and First Contact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 416.3 Different Welcome Screens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426.4 Length of the Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426.5 Time to Complete-Statement at the Beginning . . . . . . . . . . . . . . . . . . . 42

7 Outlook 447.1 Dynamic Forms, AJAX, WEB 2.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

II Theoretical Background 46

8 Introduction 47

9 Methodological Theories 489.1 Total Survey Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

9.1.1 Respondent Selection Issues . . . . . . . . . . . . . . . . . . . . . . . . . . 499.1.2 Response Accuracy Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . 599.1.3 Measurement Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 619.1.4 Statistical Impact of Error . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

9.2 Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 629.3 Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 639.4 Mode Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

10 Psychological Theories 6710.1 The Response Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6710.2 Visual Interpretive Heuristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6810.3 Gestalt Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

10.3.1 Principle of Proximity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6910.3.2 Principle of Similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6910.3.3 Principle of Pragnanz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

10.4 Types of Respondents in Web Surveys . . . . . . . . . . . . . . . . . . . . . . . . 69

III Experiments Conducted by the Author 71

11 Description of the Experiments 7211.1 General Experimental Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7211.2 Different Input Controls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

11.2.1 Radio Button Scale (radio) . . . . . . . . . . . . . . . . . . . . . . . . . . 7511.2.2 Empty Button Scale (button) . . . . . . . . . . . . . . . . . . . . . . . . . 7611.2.3 Click-VAS (click-VAS ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7711.2.4 Slider-VAS (slider-VAS ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7811.2.5 Text Input field (text) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7911.2.6 Dropdpown Menu (dropdown) . . . . . . . . . . . . . . . . . . . . . . . . . 80

5

Contents

11.2.7 Differences and Similarities . . . . . . . . . . . . . . . . . . . . . . . . . . 8111.2.8 Technical Preconditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

11.3 Specific Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

12 Research Questions 85

13 Overall Response 8713.1 Overall Input Control Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 87

14 Paradata and Additional Information 8914.1 Overall Operating System Distribution . . . . . . . . . . . . . . . . . . . . . . . . 9014.2 Overall Browser Agents Distribution . . . . . . . . . . . . . . . . . . . . . . . . . 9014.3 Overall Screen Resolution Distribution . . . . . . . . . . . . . . . . . . . . . . . . 9114.4 Overall Distribution of Additional Browser Settings . . . . . . . . . . . . . . . . . 9114.5 Summary and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

15 Demographic Information 9315.1 Tourism Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9315.2 Webpage Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9515.3 Snowboard Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9615.4 Differences in Demographic Distributions Across Browser and OS-Versions . . . . 9715.5 Summary and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

16 Feedback Questions / Subjective Evaluation 9816.1 Boring vs. Interesting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9816.2 Sufficient vs. Non Sufficient Number of Scale Intervals . . . . . . . . . . . . . . . 9916.3 Easy to Use vs. Complicated . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10116.4 Summary and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

17 Response Time / Completion Time 10517.1 Standardization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10517.2 Outlier Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

17.2.1 Robust Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10817.3 Learning Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11117.4 Summary and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

18 Dropout 11418.1 Tourism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11518.2 Webpage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11718.3 Snowboard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11918.4 Summary and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

19 Response Distributions 12219.1 Comparison by Mean Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12219.2 Compare the Distributions within the Categories . . . . . . . . . . . . . . . . . . 12219.3 Midpoint Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12619.4 Analysis per Question Battery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12619.5 Summary and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

6

Contents

20 A Closer Look at the VAS Distributions 12820.1 Distributions of the VAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12820.2 Categorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13020.3 Summary and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

IV Software Implemented 136

21 Introduction 13721.1 Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13721.2 General Software Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13821.3 Overview of Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13821.4 Supported Question Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13921.5 Editor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14021.6 Technical Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

21.6.1 Technical Preconditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

22 Software Architecture 14422.1 QSYS-core . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

22.1.1 Questionnaire Items . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14522.1.2 Answers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14822.1.3 Paradata Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14922.1.4 Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15022.1.5 Exporting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

22.2 StruXSLT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15422.2.1 Basic Functionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15422.2.2 Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15722.2.3 Action Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15722.2.4 Mapper Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15922.2.5 Language Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

22.3 QSYS-Web . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16122.3.1 Class Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16122.3.2 Configuration and Installation . . . . . . . . . . . . . . . . . . . . . . . . . 16322.3.3 Additional Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164

22.4 Utility Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16522.5 Additional Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

22.5.1 Quality Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16522.5.2 Software Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16522.5.3 Schema RFC for Questionnaires (and Answer Documents) . . . . . . . . . 166

23 Additional Tasks to be Implemented 16723.0.4 Federated Identity Based Authentication and Authorization . . . . . . . . 16723.0.5 R Reporting Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16723.0.6 Observation Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16823.0.7 Accessibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168

24 Evaluation of the Software 169

25 Existing Open Source Online Survey Tools 171

7

Contents

25.1 Limesurvey (formerly PHPSurveyor) . . . . . . . . . . . . . . . . . . . . . . . . . 17125.2 FlexSurvey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17225.3 Mod_survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17225.4 MySurveyServer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17225.5 phpESP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17225.6 Rapid Survey Tool (formerly Rostock Survey Tool) . . . . . . . . . . . . . . . . . 17325.7 Additional Web Survey Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

25.7.1 Tools for HTML Form Processing . . . . . . . . . . . . . . . . . . . . . . . 17325.7.2 Experiment Supporting Frameworks . . . . . . . . . . . . . . . . . . . . . 17425.7.3 Tools for Retrieving Paradata . . . . . . . . . . . . . . . . . . . . . . . . . 174

25.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174

8

1 Introduction

Internet-based- (or simply online-) research in general has recently become a new and importanttopic for empirical, psychological and related research. Therefore the importance of new insightsin this field as well as tools for conducting online research projects has increased. Within thescope of this thesis, work was done on both aspects:

Experiments On the one hand, experiments dealing with online surveys were conducted, wherethe effects of different visual designs on respondents in Web surveys were examined. Concretely,different input controls for scalar questions were given to the respondents to see if any effectsemerge from different points of view. Regarding online surveys, the degree of freedom concerningvisual design is much higher than for the usual paper and pencil -questionnaires, and therefore,a closer examination of this specific phenomenon is important. In addition, since experimentswith mixed-mode studies proved that there were differences between these modes1, it was partlytested, whether already affirmed insights are valid for paper and pencil -questionnaires and canthus also be applied to online questionnaires. Paper and pencil -questionnaires are only relevantin regard to these questions, e.g. comparisons of online- and offline- questionnaires or othermixed mode designs are not part of this thesis. Sample selection, a very frequently publishedtopic for online questionnaires, as well as the effects of incentives, pre-notification of potentialrespondents, and online panels are only theoretically considered as part of the overview on thecurrent status of research within this field when summarizing the state of the art of online surveyresearch in section I. This summary covers recent visual design effect experiments on respon-dent’s behavior in Web surveys. In some cases the results were contradictory which leads to theassumption that there is still a lot of research necessary within this field.

In addition, modifications made on question or question alternative texts and comparisons ofeffects based on Flash technology2 were excluded3. The research in response behavior usingonline questionnaires offers a lot of possibilities. As a result, a clear definition of the main re-search focus becomes necessary. The main focus is set on the visual design elements offered byHTML and related technologies, such as Javascript, Java and CSS. The presence of such effectswas indicated in several studies, which is also mentioned by Dillman (2007, p.478): “a rapidlyexpanding body of literature now shows that visual design has effects on the question answeringprocess that go far beyond influencing initial perceptions of what parts of the questionnairepage are relevant and the navigational process through the questionnaire”. Don A. Dillman alsoconfirms the presence of visual design effects in scalar questions: “the effects of visual design areparticularly apparent for scalar questions in which respondents are attempting to express theirdegree of agreement, satisfaction of perhaps favorableness to an idea, using an ordinary scale

1See section 9.4 for the corresponding results2Based on very complex graphical audio-visual representations and multimedia respectively3As one exception, the number of scale marks of categorical scales can differ in the different questionnaire stylesfor scalar- and semantic differential questions.

9

1 Introduction

that attempts to capture both the direction and intensity of feeling” Dillman (2007, p.478). Inaddition the relevance for future research is mentioned: “We expect that research in this areawill continue over the next several years” Dillman (2007, p.482). Similarly Couper (2000, p.475):“While the importance of question wording in influencing respondent answers is well-recognized,there is a growing literature that suggests that the design of the survey instrument [...] plays animportant role” as well as Couper, & Coutts (2004, p.217) (translated): “it is clear that exami-nation of methodological aspects of internet based surveys should be a fixed element of furtherresearch”. It is known that within self-administered surveys, in the absence of an interviewer,the respondent tends to seek information from the instrument itself, which means that visualelements of the questionnaire become even more important.

Moreover, Witmer et al. (1999, p.158) gives a clear statement on the need for additional enlight-enment for effects in survey research, when the computer environment becomes an additionalfactor: “computer-mediated research needs specific and carefully designed instruments that notonly accommodate but exploit the features of the electronic environment to attract respondentswho otherwise may have their fingers on a delete key. Researchers cannot merely import paperand pencil methodologies to online studies, but must adapt them to the electronic environmentand create new methods to expand our knowledge in and of computer-mediated communication”.This work should make a contribution to this need. Similarly Bandilla & Bosnjak (2000, p.19)(translated): “It often can be observed that, when conducting Web surveys, the logics concerningvisual layout simply follows the printed version. Obviously, when following this approach it isonly weakly taken into consideration, that the reading behavior of a Web user when readingcontent is not compatible with certain design guidelines of offline-questionnaires”. Couper &Miller (2008, p.831) start their article with “A key characteristic of Web surveys is their diver-sity”, which describes one of the major problems with this survey mode. Many more possibilitiesare offered to run the surveys, which can cause several problems.

The experiments focused on the use of Visual Analogue Scales (VAS) and other scale types inWeb surveys. Respondents were presented with semantic differentials in the form of six differentinput controls for filling out. The impact of the different controls on dropout, response timesand response behavior in general was statistically evaluated. However, technical aspects (e.g.the usage of Java and Javascript) are also discussed. Only a few experiments had already beconducted with VAS and related scale types4 with mostly contradictory outcomes. This probablystems from the fact that in literature a VAS is defined differently, which means that VAS usedin the experiments behave and look differently and shows that the look and feel of these scalecontrols have a major influence on general response behavior. Although advantages of these scaletypes are mentioned in current literature5 they are used relatively seldom in online surveys.

Survey Tool Development On the other hand, this research also includes technical aspects:to provide a basis for running all these experiments, the implementation of software becamenecessary, which focuses on the described experimental tasks concerning different visual designs.Therefore, a logically-developed concept was essential for the whole software architecture whichstrictly separates questionnaire content from its design. Research projects dealing with onlineresearch in general and online surveys in particular are often limited due to the lack of technical

4Which are reported in detail in chapter 45See e.g. Couper et al. (2006), p.229

10

1 Introduction

possibilities. In fact, the development of such software created especially for online experiment-ing initiates new opportunities in the field of empirical social (online-) research. Nevertheless, alot of software for the creation of questionnaires is available (even free of charge), but the suitablesoftware that is available for free (or at an affordable price) and centres on the experimentalpotentials listed above had to be programmed. So a fully functional and fully featured tool forexperimenting with the influence of visual design effects in Web surveys was created and offeredto survey conductors to enable further research in thisfield (and its related topics).

Furthermore the software enables the completion of surveys without the need to create one’sown experiments. Due to this application it is possible to create questionnaires and to runsurveys under certain criteria with different levels of participation control6. Even without tech-nical knowledge, such as programming skills or HTML, one can create an online questionnaire.For example, one can select from a list of different question types and customize the questioncontent with an easy-to-use online editor. The software should offer the possibility for studentsand researchers to run Web surveys efficiently and for free. Because of the generic softwarearchitecture of the system, it is possible to integrate new experiments and features necessary infuture studies. The software is published under an open source license to make the attempt ofpushing forward the development of a free, well featured, stable and extendible piece of software.Documentation for both developers and end users is also (in shortened form) part of this PhD7.Moreover this section also includes an overview of existing (open-source) tools together withevaluation criteria for such software. Additionally, an XML schema was created for storing andexchanging questionnaires efficiently. This would enable the exchange of questionnaires (andparts of questionnaires such as a demographic block) between institutions and would clearlyseparate content and representation. The schema will not be published within this document,but it is accessible on http://www.survey4all.org.

6An overview of all features and possibilities of the software is described in section 21.27See part IV

11

http://www.survey4all.org

2 Terminology

Visual Analogue Scales (VAS) are one of the main input controls tested within this work. There-fore it is necessary to give a clear definition of VAS and a clear demarcation of related scaletypes. Generally speaking, VAS is a scale type which measures ratings for semantic differentials.The most suitable and representative description of VAS is given by Couper et al. (2006, p.227f):“the VAS has a line anchored at each end by the extremes of the variable being measured, thiscan represent a continuum between opposing adjectives in a bipolar scale or between completeabsence and the most extreme value in a mono-polar-scale”. A similar definition is given by De-Vellis (1991, p.71). This should serve as a basis and will be distinguished from other definitionsand differentiations to other scale types.

The previous definitions did not define a certain mode of how these ratings have to be madeon the VAS. Interestingly, in Funke & Reips (2007a, p.70) and Funke & Reips (2008a), a clearboundary between VAS and slider scales is drawn with the argument, that inaccuracies resultfrom using the (relatively) broad slider control of a slider scale. The problem here is that thisdefinition comes from a very fixed concept of a slider scale1. It is true that the default slider andmost technologies2 use a relatively broad slider per default, but, as can be seen in figure 11.4,the slider has an apex at the bottom which enables positioning on a pixel level. The precisionof the measurement should be as high as possible: “VAS are nearly continuous measurementinstruments, each pixel is clickable and results in a raw value” Funke & Reips (2008a) adaptedfrom Funke & Reips (2007a). This definition places a great deal of importance on the assignmentof a measurement point to exactly one pixel.

In general, the question format used most often by questionnaire designers is scalar questions.DeVellis (1991, p.8f) gives a definition of measurement scales as “measurement instruments thatare collections of items intended to reveal levels of theoretical variables, not ready observable bydirect means [...]. We develop scales when we want to measure phenomena that we believe toexist because of our theoretical understanding of the world, but which we cannot assess directly”.The focus when creating scales namely mainly on visual aspects to achieve best results, will bepart of this work.

GRS Another important measurement scale closely related to VAS are Graphical Rating Scales(GRS). As mentioned in Couper et al. (2006, p.228), the distinction between VAS and GRS isblurred. This is how terminology is used within this paper: “the key distinction is not how thescale is presented to the respondents but how they indicate their responses. We will use the termVAS to indicate a scale on which a respondent directly marks his or her position on the scale,

1Here, the slider scales do have a slider, but only values at few tick points can be selected, which is thencategorized as being “more similar to a radio button scale”

2E.g. Java when using a slider within an Applet

12

2 Terminology

whereas discrete choice measures require the respondent to first select a number or adjectiveand then indicate that preference”.

Here are some definitions of GRS:

• “The GRS adds verbal descriptors along the line and sometimes also check marks dividingthe line into distinct segments” (Couper et al. (2006, p.229)).

• Cook et al. (2001, p.700) describe an unnumbered graphic scale (as part of graphic ratingscale formats) as a scale which presents a continuous line between two antonyms. Respon-dents are then asked to draw through the continuum at the point most indicative of theirviews regarding the antonyms. The responses are scored by measuring the distance (e.g. inmillimetres) from the left end of the continuum of the line drawn through the continuum.

Here are some definitions and properties of simple discrete measurement scales:

• “A likert-type scale explicitly presents its scoring metrics. When a scale uses numerousscore intervals, participants are told how many scale points there are, and they not onlycan but are expected to accommodate these intervals within their conscious thinking”(Cook et al. (2001), p.700).

• “[...] respondents select a number or adjective that most closely represents their positionson the scale. The number of scale points may be relatively small (e.g. 5, 7 or 9) orlarge, as in the case of the 101-point feeling thermometer” (Couper et al. (2006, p.228)).Interestingly, Schönemann et al. (2003, p.1171) describe the feeling thermometer as a“Visual Analogue Scale (VAS) shown as a thermometer”3.

At last, as a summary an attempt was made to put all definitions and distinctions from abovetogether without any inconsistencies. These are the properties of a VAS as used for the experi-ments within this thesis:

• The scale consists of a simple, continuous line

• Only labels exist at the two extreme marks

• No labels, tick marks or number labels are placed on the scale

• No feedback about the actual position is given to the respondent

• Measures are as fine grained as possible (ideally one measurement point per pixel, but thisis not mandatory)

• Accurate positioning must be possible

• The way the marks can be settable on the scale is not fixed: clicking on the scale at aspecific position works as well as using a slider (which means for this definition that sliderscales are also VAS, when the other points are assured)

Methodologically, the advantage is that data from a VAS reaches the desired level of an intervalscale. A good discussion of pro’s and con’s of VAS compared to other scale types can be foundin Flynn et al. (2004). The need of such a fine grained scale like the VAS in all cases is notclear: “the meaningfulness of 100 categories is seriously diminished, given that research hasdemonstrated that individuals can only efficiently discriminate between a maximum of sevencategories when processing sensory information” (Flynn et al. (2004, p.52)).

3For the use of feeling thermometer scales in social sciences, see e.g. Noelle-Neumann & Petersen (2000, p.149)and von Kirschhofer-Bozenhardt & Kaplitza (1975)

13

2 Terminology

2.1 Online Research

Because Online Research is the superior topic of this thesis, a general description of this researchfield is given in this section. In this context the internet is focused on as the major field ofresearch. A good overview of all parts of this research field is given in German by Welker et al.(2005). Advantages of online data collection are discussed in van Selm & Jankowski (2006) andGranello & Wheaton (2004).

2.1.1 Operating Fields and Organizations

In this chapter, a definition of the research area of Online Research will be attempted as well asproviding a description of the core operating fields. There are a few important institutions inthis field that provide a lot of useful information:

• DGOF 4 (Deutsche Gesellschaft für Online-Forschung). This institution annually orga-nizes the most important Online Research conference in Europe, namely General OnlineResearch5. Participating gives a good overview of the state of the art of online research.DGOF also released a miscellany (Welker & Wenzel (2007)) containing fundamentals andcase studies.

• a.o.i.r6 (Association of Internet Researchers; AoIR). a.o.i.r is an international academicassociation dedicated to the advancement of the cross-disciplinary field of internet studies.AoIR hosts a free, open access mailing list with over 2000 subscribers and organizes anannual internet research conference, one of the premier academic conferences in this field.

• Web Survey Methodology7 is an important resource for getting general information aboutthe current research on Web surveys. Publications are listed, events are announced andexisting survey software is mentioned.

• German Internet Research List (gir-l) is a discussion forum aimed at anybody in Germanspeaking countries interested in Online Research8.

4http://www.dgof.de5http://www.gor.de6http://aoir.org7http://www.websm.org8More information can be obtained at http://www.online-forschung.de

14

http://www.dgof.de

http://www.gor.de

http://aoir.org

http://www.websm.org

http://www.online-forschung.de

Part I

Current Literature

15

3 Introduction

In the following an overview of the current status of research in the field of visual design ef-fects in online surveys is given. Experiments with visual design are considered to be all visualmanipulations of the presented questionnaire which could influence the respondent’s behavior,and in turn would indicate the presence or absence of additional signs and information, differentcolors, different positioning of content or different input controls used for answering. The out-comes of the experiments described below may vary depending on the question types1. A listand description of different question types (classified by the question’s content), can be foundin Holm (1975a, p.32ff). The same is valid for differentiations in person and situation, of whichMummendey (2003, p.41ff) gives a good overview. All experiments are categorized in severalsections. Some experiments described come up in more than one section, because several dif-ferent sub experiments were run in one experiment, which could possible have caused some sideeffects.

Firstly, a summary of already accomplished experiments and findings regarding VAS is given,because VAS are the main focus of this work. This should serve as a basis for the experimentsconducted in this thesis. The design of the experiments was influenced by these findings andconcrete results were checked in regard to their reproducibility.

In addition, literature on general visual design effects in online surveys is discussed. For theexperiments conducted, it is also necessary to take these effects into consideration to avoid pos-sible side effects. Furthermore some effects described in this section, such as the numbering orlabelling of scale items, play a role for the experiments. For the sake of completeness, literatureon non-visual design effects, like effects of incentives or the length of the survey are also part ofthis chapter.

Finally, an outlook is given which mainly deals with the effects of new technologies. The moststriking new technology, which is already integrated in many Web pages and would be suitablefor employing in online surveys is AJAX.

1As regards content and other criteria, which may in some cases be the reason for contradictory findings

16

4 Effects on Visual Analog Scales, SliderScales, Categorical Scales

In this chapter an overview of the state of the art is given, which is particularly important forthis thesis since the central experiments deal with different scale types but especially VAS. Sev-eral studies that deal with this topic but focus on different aspects have already been published.These studies and their findings are also mentioned in this thesis in the appropriate subchapters.

The summary given here concentrates on the application in Web surveys. For an overview ofVAS applied in paper and pencil questionnaires, see e.g. Flynn et al. (2004) as well as Hasson &Arnetz (2005), where amongst other things, advantages and disadvantages with VAS and likertscales are discussed. Mode effects of VAS in comparison to those of likert type responses arediscussed and empirically approved in Gerich (2007).

In recent studies, similar experiments with different outcomes were completed, e.g. Couperet al. (2006) discusses the effectiveness of VAS in Web experiments. For this reason, a slider-VAS written in Java with 101 scale points was compared to common HTML controls such asradio buttons and numeric entries, in a text input field (21 scale points) together with othervariations, such as the presence or absence of numeric feedback or midpoint. Consequently,the response distributions for the Java VAS did not differ from those with other scale types.Furthermore, the VAS had higher rates of missing data, noncompletion, and longer completiontimes. The reasons for these findings could partially be explained by technical difficulties.

A few other studies in this field which deal with similar experiments should also be mentioned,e.g.: Heerwegh & Loosveldt (2002a) did a comparison of radio buttons and dropdown boxes,Walston et al. (2006) compared three different item formats for scales: radio buttons, biggerbuttons and a graphical slider; Hasson & Arnetz (2005) compared VAS and likert scales forpsychosocial measurement; Christian (2003) examines the influence of visual layout on scalarquestions. Additionally three presentations were given on research regarding VAS at the 9 th

General Online Research (GOR) Conference 2007 in Leipzig (Lütters et al. (2007), Reips &Funke (2007), Thomas & Couper (2007)). The most innovative experiment presented at thisconference was reported by Lütters et al. (2007), who compared a new kind of scale (so calledsniperscale) to a graphical slider based on Java technology as well as to a classical representa-tion based on radio buttons. The sniperscale was based on flash technology and enabled therespondents to shoot at the scale in order to select an item (the mouse pointer was crosslines).

Funke & Reips (2008a) presented an experiment where data and paradata from three scaleswere compared, namely visual analogue scales (VAS), slider scales (SLS) and radio button scales(RBS). The design was as follows: “Respondents of a 40 item personality inventory [...] wererandomly assigned to either a VAS with 250 possible values, an SLS with 5 discrete values, or a

17

4 Effects on Visual Analog Scales, Slider Scales, Categorical Scales

5 point RBS”. No initial marker was present on the scales. The VAS could only be clicked, butthe SLS marker could be clicked or slid. It is important to mention that the slider scales used inthese experiments had tick marks and only these marked points on this scale could be selected.A similar (self-selection) study with VAS was reported in Funke & Reips (2008b) which focussedon respondent burden, cognitive depth and data quality, and VAS was compared to radio buttonscales.

One empirical test of interval level measurement reported in Funke & Reips (2007c)1 varied thelength of VAS (50, 200 and 800 pixel) to check that data collected with VAS was equidistant ina self-selected student sample (n=355). On each of the VAS, positions had to be located (e.g.50%, and uneven portions like 67% or 33%). As a result, there is strong evidence, that datacollected with VAS is equidistant, on the level of an interval scale. On average the differenceto a linear relationship was at 3,2 percentage points, ranging from 2,8 for the medium VAS to3,9 for the shortest VAS. As length has no great effect, VAS should be robust to difference inappearance due to different screen sizes. As a general remark: when using pixels as units, thelength is displayed differently on computers according to the screen resolution set within theoperating system.

Cook et al. (2001) compared the Cronbach’s alpha coefficients of scores for scales administeredon the Web using a 1-9 radio button format in which respondents used the mouse to toggle ontheir responses (i.e. the analog of a likert type scale) to 41 items with Web administered un-numbered graphic scales. Furthermore sliders (1-5;1-9;1-100) were used as input controls. 3987respondents got the radio button format assigned and 420 the sliders.

In Walston et al. (2006) an experiment is described within a Web site user satisfaction survey,where recruitment was done directly via the Web page. Different controls for scale questionswere used, namely radio buttons, labelled buttons and a slider bar. This slider had tick markscorresponding to five labelled response options but the respondent could drag the slider bar toany position along the scale. Each control had the same item labels.

Mathieson & Doane (2003) discuss the benefits of fine grained scales, and consequently comparea radio button likert scale to a so-called fine grained likert scale, which contains the same labelsas the radio button scale, but additionally make is possible to click on positions between theseanchor points. This scale is implemented in Javascript (with HTML tables behind). Whenclicking on a cell of the table, the actual image is exchanged with a higher one to mark the cur-rent selection. The question posed in this experiment is if respondents use the additional pointsoffered between the anchor points. “Since the scale had 150 clickable points, there were manymore responses on the anchor points than would have been predicted by chance if the probabilityof selecting each clickable point was equal. It seems that, although there is substantial use ofpoints between the anchors, respondents are still attracted to the anchor points” (Mathieson &Doane (2003, p.7)). Respondents who chose an on-anchor point also tended to use on-anchorpoints for the rest of the items. This was also valid for off-anchor points.

van Schaik & Ling (2007) compared likert scales (7-point) and VAS with 103 undergraduate

1The key information is also given in Reips & Funke (2007)

18


psychology students. For the VAS, previously defined design principles were applied or violated.The tests were held under laboratory conditions. Responses to the VAS statements were givenby dragging a slider along the scale. The slider always started at the middle of the scale. Theresponse format did not include subdivisions, although strongly agree and strongly disagree werepresented on either end of the scale.

Flynn et al. (2004) also conducted an offline study with 112 psychology students comparing VASand 7-point likert scales. They used a within-subjects design to compare the data equivalenceof likert scales (LS) and VAS.

4.1 Completion Rate/Breakoffs and Missing Data

One of the most important tasks for Web surveys is to keep respondent’s burden as low aspossible to minimize dropout. Subsequently a summary of the comparison of different controlsin regard to dropout is given.

Funke & Reips (2008a) found a tendency for VAS to perform better concerning dropout thanSLS and RBS (but this finding was statistically not significant). Similar results were given byFunke & Reips (2008b), when they compared VAS to radio button scales, with the result of ahigher proportion of missing data reported with VAS. However, contradictory results to the pre-vious findings were reported in Funke (2005), who observed a higher dropout rate, more lurkersand more nonresponse for VAS in comparison to categorical scales. The same VAS controls wereused in all of these experiments.

Walston et al. (2006) took a look at the dropout in different phases of the survey. When Com-paring the initial exit (when the browser window was closed or the I decline to take this surveybutton near the top of the survey page was pressed2), “the most striking difference in the out-comes is that 80.2% of those receiving a slider bar survey exited the surveys as compared to61.6% (buttons), 62.5% (graphic radio) and 64.7% (plain radio) under the other appearance/itemformat conditions”.

The results regarding missing data in Couper et al. (2006) are the following: VAS had higherrates of missing data3 and higher breakoff rates (χ2 = 26.54, df=2, p<.001) compared to theother controls, these results may have something to do with technical difficulties linked to JavaApplets. An explanation for the high missing data rates of the VAS was the long time it took theApplet to loaded, and that respondents clicked the next button in error before it had appeared.Healey (2007) conducted a similar experiment and found no significant differences.

In another experiment by Couper, Traugott & Lamias (2004, p.373ff), respondents were askedto rate comments on a 10 point scale either by clicking on a radio button next to a number orby entering a number in a text entry box. Here, more missing data could be found in text inputfields when comparing to radio buttons; respondents receiving the text input field version were

2The type of scale input was visible on the page where the button was displayed3About twice as high as for the other controls; particularly for the first item

19


more likely to leave the box blank. Naturally more invalid responses were also given in the inputfield version because of the absence of an integrity check (each entry was allowed). Heerwegh& Loosveldt (2002a, p.477) discovered higher completion rates for radio buttons in comparisonto dropdown boxes as input controls (even though these differences were not statistically signif-icant).

When offering an innovative implementation of a scale (the sniperscale), diminished dropoutwas observed as reported in Lütters et al. (2007): “compared to Java slider and classical radiobutton scale, the sniper scale fulfilled the task to keep the respondent’s attention during theentire interview, which means the dropout was low compared to the other approaches”.

One conclusion of all these results is that Java Applets are possibly not the right technologyfor generating VAS because of technical preconditions which need to be fulfilled on all clientmachines.

4.2 Different Response Behavior

In a study by Flynn et al. (2004), participants tended to rate higher on the scale when usinga likert scale, and lower when using a VAS. A two-way repeated measures ANOVA revealed asignificant main effect on the response format. Similar results were found by Funke & Reips(2008a), who also observed lower mean scores for VAS compared to slider and radio buttonscales. In contrast, Thomas & Couper (2007) reported higher means for the VAS in comparisonto a GRS. In a similar study to the previously mentioned ones by Couper et al. (2006), VASwere compared to radio button scales and no difference in means could be found. In addition,no differences within the response distributions for the VAS compared to those using the otherscale types under control (radio buttons and text input fields) could be found. Distributionsacross the three input types were remarkably similar and no significant effect of input type couldbe found (MANOVA, Wilk’s lambda). Regarding variance (between person/within question andsame person/different questions), all tests were non-significant. This is equally true for the rangeof scores.

In one experiment by Christian & Dillman (2004) a five point likert scale was compared to anumber box. On the five point likert scale the extreme points were only verbally labelled andthe remaining points only got numbers, and for the number box a number between one and fivehad to be entered. As a result, the use of the number box significantly increased the mean foreach of the questions tested. When the respondents were given the chance to correct their orig-inal answers, 10 percent of the answer box respondents4 scratched out the answers to at leastone of the questions and provided a different answer. Most of these errors occurred becauserespondents reversed the scale on the answer box version, which means 4 was swapped with 2and 1 with 5.

In Stern et al. (2007), this experiment was repeated (with a modest modification by includinga don’t know response), where the findings of the original study were supported. An additional

4Compared to one percent on the polar point scale respondents

20


finding was that those respondents who received the number input box version were significantlymore likely to provide a don’t know response (Stern et al. (2007, p. 124)). But the mainreason for repeating the experiments was to see if there were stronger differences in certaindemographic groups (age greater or less than 60 years; college degree or not; gender). The effectdescribed above could also be found in all demographic groups except men. One conclusion isthat demographic information (if respondents are older or younger than 60 or respondents havefinished college respectively) does not have any influence.

4.3 Reliability

Cook et al. (2001) found the highest alpha coefficients for radio buttons (compared to slider-VAS). The more items were used, the higher the coefficient. Contradictorily findings are reportedin Flynn et al. (2004), where higher alpha values were found for the VAS. In Funke & Reips(2008a), test-retest reliability was measured for radio button scales, slider scales and VAS. Thehighest scores were found for VAS. Similar findings were reported in Funke & Reips (2008b),where radio button scales were compared to VAS. In this case VAS also reached the highest score,and so it was concluded that VAS were used in a more consistent way. This was also observedin Funke & Reips (2007c), where a 5-point categorical scale with radio buttons was compared toVAS. Additionally, lower variance, when comparing to categorical scales, was reported in Funke(2005).

4.4 Categorization

This section provides a short summary of strategies for categorizing VAS values by using dif-ferent transformation strategies. Funke & Reips (2006) conducted two experiments: in the firstone 667 participants were randomly chosen to rate 16 items under 3 different conditions. Theonly difference between the experimental conditions consisted in the applied rating scales: ei-ther a 4-point categorical scale, an 8-point categorical scale or a VAS was presented. Systematicdifferences in the distribution especially concerning the extreme categories were found whenapplying linear transformation (VAS had higher frequencies in the extreme categories). Trans-formation with reduced extremes (see figure 4.1) led to greater accordance between VAS andcategorical scales than linear transformation. The difference between measurement with VASand categorical scales was systematic.

Figure 4.1: Transformation with reduced extremes

21


Subsequently, in the second experiment space between the two extreme categories and the ad-joining ones was decreased for the radio buttons. The outcome was that, when using categoricalscales with reduced extremes, the frequencies of extreme categories decreased. In both studies,no information about statistical significance was given.

In Funke & Reips (2008a) as well as in Funke & Reips (2007c)5, findings from Funke & Reips(2006) were revalidated: in both the frequencies at the extreme points of the used VAS werehigher6. Similar experiments with the same outcome were reported in Funke (2005), Funke(2004) and Funke (2003). In contradiction to these findings, Couper et al. (2006) observed asignificantly higher use of extreme values for radio and numeric input versions compared to theVAS version. The problem is that the two VAS used were difficult to compare: in the first VAS,clicking was necessary to position on the scale, in the second a slider was used for positioning.

4.5 Use of Midpoint and Extremes

Whether it is useful to provide a midpoint response in scalar questions format is a highly debat-ted topic. Couper et al. (2006) found no statistically significant differences between using andnot using a midpoint, but radio buttons and numeric input fields have a higher use of valuesaround the midpoint. One interesting finding was that when no real midpoint was on the scale(1-20), 10 was selected more often in input fields if numbers had to be entered. One explanationis that 10 seems to best represent the midpoint.

van Schaik & Ling (2007, p.18f) compared radio button likert scales with VAS. After convertingthe VAS to the 7-point scale, differences in frequencies for the middle neutral response categorywere within 8% between the two response formats, with more neutral answers for VAS than forlikert scales. Interestingly, respondents believed that likert scales lead to a bias towards neutralanswers. The results of the extreme values were similar: participants believed that likert scalesmight lead to the avoidance of extreme responses. After conversion of the VAS, differences infrequencies for the extreme lowest response category was within 3% between the two responseformats, and 2% for the highest response category.

4.6 Actual Position Feedback

The provision of feedback in regard to the actual position on the VAS had a significant effecton respondents’ behaviour (e.g. via tooltip or labeled positions). Couper et al. (2006) foundsignificant differences between means when comparing VAS with and without feedback (in 2questions under control). Another interesting effect was found: when giving feedback on theVAS, rounded values (heaping) were used more often. These findings are statistically significantand it suggests that providing feedback may negate some of the advantages of using a continuousmeasurement input device such as VAS.

5n=576; 5,7, and 9 categories were used6All experiments which used a VAS were generated with http://VASGenerator.net

22

http://VASGenerator.net


4.7 Spacing, Positioning, Disproportion, Shading, Labelling

In Tourangeau et al. (2007) two experiments were carried out in order to investigate how theshading of the options in a response scale (7-point likert scale) affected the answers of the surveyquestions. The first experiment varied two factors (the color scheme of the shading came in twoversions): the first ranged from dark to light blue, the second from dark red to dark blue (themiddle point was white). In addition numerical labels for the scale points were varied: (a)verbal labels only, (b) numerical labels only (c) verbal labels plus numerical labels ranging from-3 to 3 (d) same as (c) but numerical labels ranging from 1 to 7. The second experiment wasa replication of the first one, with new questions and different respondents. The general resultwas that when the end points of the scale were shaded in different hues, the responses tendedto shift toward the right end of the scale, compared to the scales with both ends shaded in thesame hue. When verbal labels were used, the color effect vanished. The way the points werenumbered also influenced response behavior. In the case of numerical labels ranging from -3to 3 instead of 1 to 7, responses were pushed towards the high end of the scale. It would beinteresting to repeat the experiment with two different color schemes using different colors tocheck if the color red, which is a special color as mentioned in Gnambs (2008), has any influence.Yan (2005, p.75ff) also reported the results of a Web experiment, in which the numerical valueson a scale were manipulated (0 to 6 vs. -3 to 3). Significant effects were found for three of fouritems, where the mean ratings were higher for the (-3 to 3)-version, which confirms the findingsof Tourangeau et al. (2007).

4.8 Response Time / Completion Time

Couper et al. (2006) report longer completion times for VAS (170.6 sec) compared to radio but-tons (124.8 sec) and numeric input fields (153.8 sec). Similar results were reported in Cook et al.(2001), who also observed longer response times for slider controls (VAS). Additionally, at theend of the questionnaire, respondents were asked to estimate how long they thought the surveytook. VAS had the highest mean value (17.49 min) compared to radio buttons (16.41 min) andnumeric input fields (16.69 min). In Heerwegh & Loosveldt (2002a, p.481), differences whencomparing radio buttons and dropdown boxes were found, where, as expected, radio buttonsled to shorter response times. As a possible reason it was argued that people were less familiarwith dropdown boxes. This experiment was carried out with 3rd year students. This effect couldinterestingly not be replicated with people from a public internet database. Response times weretracked on the client side.

Tourangeau et al. (2007) found a correlation between the type of scale point label and completiontime: when items were fully labelled (compared to when only the anchor points were labelled)it took respondents longer to answer, regardless of whether the scale points were numericallylabelled or not. Presumably the additional time was needed to read the verbal labels.

When comparing text input fields and radio buttons as reported in Couper, Traugott & Lamias(2004, p.375), no statistically significant differences concerning time could be found, but inFunke (2005) higher response times were found for VAS when compared to categorical scales.The sniperscale (flash-based scale where the respondent had to shoot at the scale items), as

23


reported in Lütters et al. (2007), took respondents longer than when using a slider implementedin Java and the radio button scale.

4.9 Feedback Questions

In Walston et al. (2006, p.285 ff), respondents were provided with five feedback questions andasked to rate five qualities of the survey concerning different item formats (radio button, but-ton, slider VAS), whereby the following extreme points were offered: attractive vs. unattractive,worthwhile vs. a waste of time, stimulating vs. dull, easy vs. difficult, satisfying vs. frustrating.In all pairs, the slider VAS had the highest mean values, which in this case reflects badly onVAS. However none of these comparisons reached statistical significance. In Funke & Reips(2008b), VAS were compared to radio button scales in 2 studies, whereby higher response timeswere measured for VAS, but interestingly, response times were underestimated more often whenusing VAS (respondents were asked how long they thought the questionnaire had taken).

van Schaik & Ling (2007, p.18f) reported the respondent’s preferences when comparing radiobutton likert scales and VAS. Here, a significant majority preferred the likert response format.Based on openended questions, the following advantages and disadvantages for the two responseformats were mentioned: As advantages for both, ease/speed of use was mentioned, an advantagefor the likert scale was clarity of response and for VAS the degree of choice. As disadvantagesfor the likert format, difficulty of mapping judgement to 7-point numerical scale and responseset, and for the VAS lack of clarity and consistency and usability was frequently answered.

24

5 General Visual Effects in OnlineQuestionnaires

5.1 Paging versus Scrolling (All-in-one or Screen-by-screen)

The question dealt with in this section is if the entire questionnaire should be visible at once,or whether each question should be on a separate screen. The advantages of the first approachis that the respondent can answer questions faster as there are no additional loading times foreach separate page and question. Furthermore, the first approach corresponds to the format ofpaper and pencil questionnaires which people are more familiar with. Respondents can also veryeasily see what they previously filled out. A general recommendation on how many questionsshould be placed on one page, is given in Crawford et al. (2005, p.52): “As standard we recom-mend for Web-based surveys, when the capability exists, is to provide only so many questionson one screen as fit without requiring the respondent to scroll to see the navigation buttons”,but consequences of the selection of a certain style is discussed subsequently, when results ofconcrete experiments are presented.

In Weisberg (2005, p.123), the screen-by-screen approach was preferred for several reasons: “Thisenables the computer programmer to control the flow of questions instead of requiring respon-dents to follow complicated skip patterns themselves. The screen-by-screen approach also facili-tates sophisticated survey experiments in which different half-samples (split-ballot-experiments)are given different question wordings, question frames, and/or question orders”. Additionally,improved observability was given when the pages consisted of single questions (e.g. time trackingor the exact point where dropout occurred). Furthermore questions are presented in much moreisolation to the respondents. Individual screen construction techniques provide fewer contextthan people normally have for answering questions and especially problematic when people wereasked a series of related questions (Dillman et al. (1998)). In addition if backtracking to previousanswers was not enabled for screen-by-screen solutions, it also happened that respondents losttheir sense of context. In some cases however scrolling to the next question had to be prohibiteddue to order effects. A hybrid version consisting of blocks of questions which are presented allat once, is methodologically more similar to the scrolling mode.

Peytchev et al. (2006) checked the differences in response behavior concerning paging versusscrolling in an experiment with undergraduate students. No significant differences in versionsor break-off rates were observed. Concerning the overall completion times, the scrolling versiontook significantly longer.

A related question is if there is a higher correlation among items when they are placed together.

25

5 General Visual Effects in Online Questionnaires

In Couper, Traugott & Lamias (2004, p.372) and Couper et al. (2001), higher correlations1 couldbe observed (but were statistically not significant). Furthermore the effect of multi- versus single-item screens in item-missing data was examined. As a result, nonresponse decreased when themulti-item screen version was used, because this version was less burdensome to respondents. Ina paper by Toepoel et al. (2008) not directly paging versus scrolling was compared through thepresentation of scale items on one screen versus on two screens. Consequently no evidence wasfound that correlations between the items were higher when items were presented on a singlescreen than when presented on 2 screens. Moreover no evidence was found that placing all itemson a screen increases nonresponse, even if no differences in filling-out times could be found.

5.2 Fancy vs. Plain Design

In this section a comparison between very simple and graphically complex designs in Web surveysis provided. As Dillman et al. (1998, p.3f) showed in their experiments, very complex surveyswith advanced graphical designs (regarding colors, images) can be counterproductive. This isdue to the higher complexity of the Web page than the plain presentations without graphicsand with only black and white colors. It also takes less time to complete the whole survey2, andhigher completion rates were observed. This stands in stark contrast to the warnings that canfor example be found in Andrews et al. (2003, p.190), that “poorly designed Web based surveysencourage novice Web users to break off the survey process”.

5.3 Different Alternatives’ Representation in ClosedendedQuestions

5.3.1 Check-all-that-apply vs. Forced-choice

Check-all-that-apply questions list all possible alternatives and the respondents can select theappropriate ones (in Web surveys this is usually realized with a checkbox ). When deciding ona forced-choice format, respondents are explicitly asked about each alternative, and whether itshould be selected or not is chosen with radio buttons.

A general principle for the use of check-all-that-apply questions is stated by Dillman et al. (1998,p.13)3 as: “be cautious about using question structures that have known measurement problemson paper questionnaires, e.g. check-all-that-apply [...]”. The drawback, when using check-all-that-apply on very long lists is that people often satisfice, i.e. check answer choices until theythink they have satisfactorily answered the question. “Considerable evidence exists that peopleoften do not read all of the answer choices before going on to the next question” (Dillman et al.(1998, p.13)). This effect increases when scrolling becomes necessary to see all alternatives. An-other principle can be found in Dillman (2007, p.62ff): “eliminate check-all-that-apply questionformats to reduce primacy effects”, which means that items listed first are more likely to bechecked. In addition it is difficult to figure out the real reason for leaving a checkbox unchecked:

1Measured by Cronbach’s alpha coefficient2Although some arguments about response times are outdated, e.g. in some cases the initial loading times arenot as relevant as they were in the past due to high speed internet connections

3And similarly in Dillman (2007, p.398f)

26


does the option not apply to the respondent, is he or she neutral or undecided about it, or wasthe option simply overlooked?

A comparison of these two question formats in two Web surveys and one paper survey is de-scribed in Smyth et al. (2006a). The general finding was that “respondents endorse more optionsand take longer to answer4 in the forced-choice format than in the check-all format” (on aver-age 4.1 versus 5.0 options were selected). Regarding duration, an interesting difference couldbe found: “Respondents who spent over the mean response time on check-all questions markedsignificantly more answers on average than those who spent the mean response time or less (5.6vs. 3.7). [...] In contrast, forced choice respondents using greater than the mean response timedid not mark significantly more options for most questions than their counterparts who usedthe mean response time or less (5.2 vs. 5.0)” (Smyth et al. (2006a, p.72)). Additionally it wasfound that the use of the more time intense mode forced-choice had no influence on nonresponse.Furthermore, when neutral alternatives, such as don’t know, were provided in the forced-choiceformat, the third (neutral) category did not draw responses from the yes category for eitherquestion. In Stern et al. (2007), all in all, these experiments were repeated and results wereconsistent.

Smyth et al. (2008) compared (forced-choice) yes or no answers on the telephone with check-all-that-apply answers on the Web. Recent experimental research has shown that respondents offorced-choice questions answer significantly more options than respondents to check-all questions.This was also confirmed for this study across modes. Within the Web mode, the forced-choicequestion format yielded higher endorsement of options than the check-all format. Overall, theforced-choice format yielded an average of 4.74 of the options (42.3% of them) endorsed andthe check-all format yielded an average of 4.19 (38.3 %). A similar effect was found withina telephone survey (4.44 (41.3%) vs. 3.87 (37.2%)). Additional comparisons showed that theforced-choice format performs similarly across telephone and Web modes (Smyth et al. (2008,p.108)).

Thomas et al. (2007) examined how these response format modes (amongst others forced choiceand check-all-that-apply) could affect behavioral intention measurement. 56.316 (U.S. 18+)participated in a Web survey. The forced-choice mode yielded significantly higher likelihoodendorsement frequencies across behavior than check-all-that-apply.

5.3.2 Grouping of Alternatives

Grouping of alternatives in closedended questions can be achieved in several ways. The resultsof experiments where lines, spacing and additional headlines for these groups were manipulated,are subsequently reported:

Effects of grouping of alternatives of closedended questions are reported in Smyth et al. (2006b)and Smyth et al. (2004). Three versions of presenting the alternatives of closedended questionswere assigned to the respondents:(1) an underlined heading was placed above each of two subsets of three response options ar-ranged in a vertical format.

4At minimum 45 percent longer and on average two and a half times longer

27


(2) is same as version 1, but with an additional message in the question text: Please select thebest answer.(3) all choices were placed in a single vertical line with no indication of sub grouping (whichmeans no headings and no additional spacing between groups).

The results of these experiments indicated that the use of header and spacing influenced answerchoices: respondents to the grouped version not only chose more response categories than therespondents to the version with no sub grouping, but were more likely to select at least oneanswer from each of the sub groupings (Smyth et al. (2006b, p.11)). Interestingly, the effectis stronger in fact-based questions compared to opinion-based questions. Similar findings werereported in Healey et al. (2005), where it was found that when response options were placed inclose graphical proximity to each other and separated from other options, respondents perceivedvisual sub groups of the categories. This increases the likelihood that they selected an answerfrom each sub group5.

In Tourangeau et al. (2004), two experiments comparing methods for including nonsubstan-tive answer categories (like don’t know and no opinion responses) along with substantive scaleresponses were carried out. In the first experiment, two versions were compared: (1) nonsubstan-tive options were presented simply as additional radio buttons; (2) a divider line was placed toseparate the scale points from the nonsubstantive options. In the second experiment, additionalspacing was added to segregate the scale points from the nonsubstantive options. As a result,the mean values of the substantive answers were higher when there was no separation betweenthe five scale points. As a conclusion to these findings, the following recommendation was given:“As a practical matter these results suggest that nonsubstantive response options (if they areoffered at all) should be clearly separated from the substantive ones. This strategy may havethe drawback of calling attention to the nonsubstantive answers, producing higher rates of itemnonresponse” (Tourangeau et al. (2004, p.376)). The reason for the second phenomenon maybe due to the effect created when sub grouping is used (the nonsubstantive items form a groupwhen they are separated), and consequently the respondent tends to select one item per groupas found out by Smyth et al. (2006b)6. Similar findings can be found in Christian & Dillman(2004): experiments with equal versus unequal spacing between response boxes in closedendedquestions were conducted. Significant results could be found in nominal scale questions. Thealternative which was more set off from the others was selected more often as it was possiblyseen as one independent group, and thus the findings of this study do support findings of thestudies described above.

Another experiment carried out by Tourangeau et al. (2004) also works with the spacing be-tween the verbally labelled ordinal items of a closedended question, but not to check groupeffects. Tourangeau et al. (2004) examined what happened when the answer categories for anitem were unevenly spaced, and, as a result, the conceptual midpoint for an item did not coincidewith the visual midpoint. Spacing between the items was only reduced for items located right ofthe conceptual midpoint. When using uneven spacing, 63.4% of the respondents chose answersfrom the right side of the scale where there were bigger spaces between the items, compared toeven spacing (58.3%). This shows that not only the verbal label attached to a scale point, but

5See section 10.2 for near means related principle6Who had also enabled multiple selection

28


also its position in relation to the visual midpoint infers which specific value the scale point issupposed to represent.

Again Tourangeau et al. (2004, p.387) checked the near means related heuristic by taking alook if there are stronger interconnections among items that are displayed on a single screenthan among those displayed on separate screens and thereby boost the correlation among them.Eight items had to be rated on the same 7-point response scale. As expected, the response tothe eight items to rate are more highly correlated when the items were presented in a grid on asingle screen (Cronbach’s alpha of 0.621) than when the eight items were presented in two gridson separate screens (Cronbach’s alpha of 0.562).

5.3.3 Double or Triple Banking

When using an ordinal scale, respondents make their decision on a (more or less) implied contin-uum; when double or triple banking is used, this continuum is interrupted and makes it harderfor the respondent, because some sort of mental transformation has to be achieved in advance.When a nominal scale is used, there should not be any differences, because no ordering existsat all.

Christian & Dillman (2004) conducted two experiments: one with linear layout versus triplebanking of an ordinal scale and the second with linear layout versus double banking of an or-dinal scale. It was found that respondents are more likely to select responses from the top line(primacy effect) in the nonlinear version, which is statistically significant for the triple bankversion. Healey et al. (2005) found that when double banking and no banking were compared,double banked items showed 6% more checked items per respondent than non-banked items, butthis finding was not statistically significant.

As a recommendation, a box should be placed around the categories in order to group them asall being relevant for the question (Dillman et al. (1998, p.12)).

5.3.4 Response Order Manipulations

Stern et al. (2007, p.127f) describe experiments dealing with the order of alternatives in twoclosedended questions. In one version, the response options started with the high end and in theother version with the low end. It would be necessary to check for primacy effects however. Forthe first question (How often do you travel more than 100 miles outside the area) the reversal ofresponse categories did not result in differences in response distributions. In contrast, the rever-sal of response options for the second questions (How often do you use an internet connection toaccess the Web for e-mail? ) resulted in significant differences in response distributions. Wheneveryday was first in the list it was selected at much higher rates than when it appeared atlast (56.5% and 37.5%, respectively). The reason was assumed in the similarity of the responseoptions everyday and nearly everyday : in version 1, nearly everyday appeared below everyday,15.6% of respondents chose it; whereas in version 2, where it appeared before everyday, 28.8%of respondents chose this response item. An extension of the experiment described above wasthe inclusion of a don’t know response option. The items were reversed as in the experiment

29


above, but the don’t know appeared at the end in both versions. It was expected that the don’tknow response would be chosen less often when the options were ordered from most positive tomost negative because in this ordering respondents could quickly find an option that fits them.One significant finding was that when options started with the negative categories, the don’tknow categories were more likely to be chosen, compared to when the response options beganwith positive categories (23.4% and 14.5%). In the same paper experiments with response ordermanipulations in a ranking question were accomplished. For this purpose, two versions of aquestion that asked respondents to rank eight problems facing the area from the largest to thesmallest were created. The response options that appeared in the first and last two positionsshowed the largest effects. The middle categories seemed unaffected by the reversal.

Another experiment reported in Tourangeau et al. (2004, p.380ff) varied the response options.Three version were offered: (1) response options were presented in an order that was consis-tent with the top is first characteristic7, which means the top option was one of the endpointsand each of the succeeding options followed in order to extremity; in version (2), items weremildly inconsistent, only two options were exchanged and in version (3), items were orderedinconsistently. As a result, the hypothesis that “when the scale options do not in fact followtheir conceptual rank order, it will slow respondents down (and possibly affect their answers)”was confirmed by this experiment. Furthermore the distribution of responses was affected, par-ticularly for the option it depends: “The proportion of the respondents selecting the it dependsoption dropped dramatically when that option came at the bottom of the list than when it camein the middle or at the top of the list”.

Hofmans et al. (2007) conducted a Web experiment with 156 higher educated participants,whereby the scale orientation was manipulated. The items were rearranged into 8 subsets, sub-sets 1 and 4 appeared with a decremental scale and the items in subsets 5 and 8 were scoredon an incremental scale. Two randomly selected items from each of subsets 1, 4, 5, and 8 wererepeated in subset 5, 8, 1, and 4 respectively so that these items were filled-out twice but withreversed scales. The main effect of orientation was non-significant, which means that the orienta-tion (incremental or decremental) of the scale had no impact on the average values of the ratings.

Galesic et al. (2008) introduced a new method to directly observe what respondents do and donot look at during the filling out process by recording their eye-movements8. The eye-trackingdata indicated that respondents do in fact spend more time looking at the first few options in alist of response options than those at the end of the list. Malhotra (2008) found that responseorder effects are even higher when short completion times can be observed.

5.3.5 Different HTML-Controls Used

Couper, Tourangeau, Konrad & Crawford (2004) compared the use of a list of radio button,dropdown box and a drop box (or select field) for selecting alternatives for closedended ques-tions together with certain variations like reversing the order of the alternatives and varying thenumber of initially visible items. They attempted to check the visibility principle: “options thatare visible to the respondent are more likely to be selected than options that are not (initially)

7See section 10.2 for further explanation8This method is called eye-tracking

30


visible” (Couper, Tourangeau, Konrad & Crawford (2004, p.114)). The visibility principle wasconfirmed, visible items were more likely to be selected. Concerning response times, no differ-ences were found.

A similar experiment was carried out by Reips (2002a) who compared a radio button scale witha dropdown box (10 scale points with numerical labels were presented to the respondent). Twoversions of the questionnaire, one in English and one in German, were created. The Germanlink was sent to the German Internet Research List (gir-l) to check possible expert’s effects.No differences concerning the answer distributions could be found. Additionally, the labels werevaried from (-5/+5) to (0/10). Interestingly the means of ratings for the (-5,+5)-labels weremuch higher in the English (and non-gir-l) version. It seems as if people on the expert-list weremuch more aware of the well-known effects of avoiding negative numbers as described in detailin Schwarz et al. (1991) as well as in Fuchs (2003, p.30f). Another possible explanation for theshift to higher values when using (0,10)-labels instead of (-5,+5)-labels is given by Schwarz et al.(2008, p.20): “When the numeric values of the rating scale ranges from 0 to 10, respondentsinferred that the questions refer to different degrees of success with not at all successful markingthe absence of noteworthy achievements. But when the numeric values ranged from -5 to +5,with 0 as the middle alternative, they inferred that the researcher had a bipolar dimension inmind, with not at all successful marking the opposite of success, namely the presence of failure”.

van Schaik & Ling (2007, p.19f) report the results of a comparison of radio buttons and dropdownboxes (both with 7 scale points) in an experiment with 127 undergraduate psychology students.In regard to completion times, radio buttons were significantly faster (109.7 vs. 127.64 seconds).Moreover, respondents changed their answers more often when using radio buttons. Couper,Tourangeau, Konrad & Crawford (2004) found out that the primacy effect was largest withdrop-boxes (compared to radio buttons and dropdown boxes) whereby five options were initiallydisplayed (compared to all are visible).

Heerwegh & Loosveldt (2002b) conducted two experiments with university students comparingradio buttons and dropdown boxes. In both experiments, the response rates did not significantlydiffer from each other across conditions9. Within the second experiment, significantly lower(p<0.01) dropout was produced when radio buttons were used (percent respondents retained:96.6% versus 86.4%). Furthermore, in the first experiment, a no answer category was alsoavailable for both controls. No significant difference concerning the selection of this item wasobserved. Download times between these two controls differed significantly in the first experi-ment (not in the second), and showed that the radio button controls allow faster filling out.

5.4 Color Effects

Since color in general is extensively used in Web design, this section will deal with some rec-ommendations on how colors should be applied and which problems can occur when color isinappropriately used. Unfortunately only a few experiments deal with color effects.

9It has to be mentioned that the controls were visible the moment respondents logged on to the survey

31


There are some colors which have a special influence on the respondents, e.g. red (see Gnambs(2008) who gives a psychological explanation and a more detailed discussion). Two experimentswith the color red in a General Knowledge Test (GKT ) are subsequently described, whereby inthe first the color of a progress bar was manipulated (black, green, red), and in the second one,the colors of buttons were manipulated (red vs. blue. vs. red/blue (red only on first screen)).Color cues in red resulted in significantly lower scores on the GKT for men and women in studyone. In study two a slight increase in GKT scores under the red condition was observed forwomen.

As the Web Content Accessibility Guidelines10 suggest that color should not be “used as theonly visual means of conveying information, indicating an action, prompting a response, or dis-tinguishing a visual element” since this would exclude respondents with visual hanidcaps frombeing able to complete surveys. However using color for transporting information in Web pagesis dangerous in any case. Dillman (2007, p.383) exemplified the possible use of color in sur-veys by allowing respondents in his experiment to chose their preferred color for various articlesof clothing through the selection of the correspondingly colored button. He mentions howeverthat, “although such items are attention getting, it is important to recall what concept is beingmeasured. Because colors may appear quite different on various respondents’ screens due tovariations in operating systems, browsers and display settings, the blue selected or rejected byone respondent may be quite different to that viewed by another”. Even when several displaysare attached to one computer, colors appear differently when the screens are not calibrated ac-cordingly.

Some combinations of background and text color make it impossible for people with color blind-ness to read the contents of the questionnaire at all (Dillman (2007, p.382)). Generally, Crawfordet al. (2005, p.48) recommended that surveys should not contain background colors, since theymay create problems when the contrast is too great an the text becomes difficult to read. Jackob& Zerback (2006, p.19) are of a similar opinion and recommended the use of standard colorsand fonts.

5.5 Modifications on Input Fields for Openended Questions

5.5.1 Different Size

Christian & Dillman (2004) carried out experiments with the size of input fields for openendedquestions. They doubled the size of the openended answer space on one version of each of threequestions. Varying the amount of answer space on openended questions influenced both thenumber of words and the number of themes provided by respondents. For all three questions, thelarger space produced longer answers with a significantly greater number of words. In additionthe longer answers generally contained more topics (Christian & Dillman (2004, p.68)). Sternet al. (2007) repeated this experiment and found consistent results. Moreover there were nosignificant findings when comparing demographic groups. However, there was a tendency forrespondents who were either over 60 years of age, or who had less than a college degree, or whowhere women, to all provide responses that were one to two words longer than their comparison

10See Caldwell et al. (2008)

32


group, regardless of the size of the box. Weisberg (2005, p.122) also mentioned this effect:“people are more likely to give a long answer when a large space is left blank after the questionand less likely when questions are tightly packed on the form of the screen”.

5.5.2 Lines in Answer Spaces

The Christian & Dillman (2004) experiment described in section 5.5.1 was modified by addinglines to the openended answer spaces. It was expected that in the version with lines moredetailed answers would be given but no significant results were found.

5.6 Influence of Images

Images in general play a major role in Web design and thus are also integrated in Web question-naires. Couper, Tourangeau & Kenyon (2004, p.257f) distinguish three types of picture usagein Web surveys:

1. Questions about which images are essential (such as questions on recall of an advertisement,brand recognition questions, questions on magazine readership).

2. Questions in which images supplement the question text, whether the images are intendedas motivational embellishments or as illustrations of the meaning of the question.

3. Questions in which the images are incidental (providing branding, an attractive back-ground, etc.)

Both Couper, Conrad & Tourangeau (2007) and Tourangeau, P.Couper & Conrad (2003) carriedout experiments aiming to research the consequences of image availability in a relatively extremeexperimental design on response behavior. They both experimented with visual context effectsnamely the effect that pictures of a healthy woman exercising versus a sick woman in a hospitalbed have on self-rated health, whereby size and placement of the images were varied. In the casethe pictures were much more than a simple embellishment. The response scale was a 5-pointfully labelled radio button scale ranging from excellent to poor. The general outcome was thatpeople consistently rated their own health lower when exposed to a picture of a sick woman.Interestingly there was no effect, when the picture was placed within the header region (theauthors mentioned banner blindness11 as possible reason for this effect) in 2 of 3 experiments.The conclusion was that images can systematically affect responses when their content has rele-vance to the survey question. Crawford et al. (2005, p.48) recommended a similar use of imageswhen they state: “Care must be taken to using graphics that will not influence how respondentsrespond to the survey question. For example, if a survey were in support of an evaluation of aprogram geared to connect police officers with children in the community, it would not be a gooddesign to include a logo that showed a friendly-looking police officer smiling at the respondentthroughout the survey”.

Witte et al. (2004) report the results of a national geographic survey tracking the support oropposition for endangered species (animals). One instrument provided a photographic image

11The fact that people ignore banners on top of the pages, despite their being designed to be attention getting(Couper, Conrad & Tourangeau (2007, p.628)

33


of the species in question and the other used a simple text. As expected, strong support forprotection increased when respondents received a photograph. This effect was found more oftenwith male respondents. Additionally picture quality played an important role.

Deutskens et al. (2004) give a report of a further study concerned with product evaluationsand experimental modifications in two versions of a questionnaire: (1) simply a list with articlenames and (2) all articles with names and a picture. Interestingly, the visual representation ofthe questionnaire had, with 19.0%, a significantly lower response than the textual presentationwith 21.9%. This was possibly due to the the additional time which was needed by the browserto download and align the images correctly, which resulted in a higher burden for the respondent.Another effect was that when respondents got the textual representation, they were more likelyto select don’t know as an answer than in the visual representation.

Couper, Tourangeau & Kenyon (2004) experimented with the use of photographic images tosupplement question texts. In a travel, leisure and shopping questionnaire, respondents weregiven 4 versions: (1) no pictures, (2) pictures with low frequency instance (which means a pictureshowing an action people seldom do), (3) pictures in high frequency and (4) both pictures.Respondents were asked about the frequency of certain actions in their daily life, such as listeningto recorded music in the past week. For this question, the low frequency instance was listening tothe hi-fi, and the high frequency instance was listening to the ear-radio. As a result, four of thesix topics and the means for the high and low frequency conditions differed significantly (p<.01)from each other. It was observed that when the picture of the high frequency instance wasshown , respondents reported a higher average than when shown the picture of a low frequencyinstance. In addition there was no correlation between the experimental treatment and the age,sex and education of the respondents.

5.6.1 Personalization - Virtual Interviewer Effects

Interviewer effects in live interviews are well known and documented. This section describesWeb survey experiments with present interviewers (or researcher) in the form of photos.

In an experiment reported by Tourangeau, Couper & Steiger (2003) which deals with interviewergender effects in Web surveys, images of a (1) male researcher, a (2) female researcher, or (3)the study logo was displayed on the questionnaire. This was varied in a second study where atext message from the (male or female) researcher was additionally displayed or not. Further-more, questions about the roles of men and women were asked. Concerning socially desirableresponses, with the exception of the effect of personalization on gender attitudes no results werereported where the effects reached statistical significance. The authors expected respondents ofboth sexes to report the most pro-feminist attitudes when the questionnaire contained picturesand messages of the female investigator and the least pro-feminist attitudes when the programdisplayed the pictures and messages of the male investigator. This pattern was apparent in thescale means of condition and reached statistical significance, but the effect was much smallerthan found for live interviews in other studies. One idea behind displaying the photos was tohumanize the questionnaire process, but there was no evidence that adding humanizing featuresimproved response or data quality in the survey.

34


Krysan & Couper (2006) conducted aWeb experiment focussing on the effects of the interviewer’srace. A representative sample of white respondents were asked. The race of interviewer andsocial presence versus mere presence (4 pictures) was manipulated using images of black andwhite people. Only one of the four attitude scales, namely the stereotype scale12 had a significant(p <0.01) effect in regard to race. When white respondents were presented with images of AfricanAmericans (regardless of mere versus social presence), their endorsement of negative stereotypesof African Americans was lower (3.11) than when presented with images of whites (3.32). Inaddition, the control condition - which had no images of people - was closer to the mean for thewhite image condition. No overall effect of social versus mere presence could be found (Krysan& Couper (2006, p.21)).

5.7 Heaping

It was discovered by several different researchers that both, openended questions, where numeri-cal values have to be entered, and closedended questions using scales with numerical labels posea similar difficulty: when a numerical question of a big range has to be given and the respondentdoes not know the exact value to give, rounded answers are given. The respondents createdtheir own grouped answer categories. One example would be age heaping with multiples of 5(Tourangeau et al. (2000, p.233)). These rounded values were used when it was difficult (orimpossible) for respondents to come up with an exact answer. The difficulty may arise fromimprecise encoding of the information in memory, from indeterminacy on the underlying quan-tities, or from the burden of retrieving numerous specific pieces of information. In each of thesecases, the use of rounded values may reflect problems in the representation of the quantity in thequestion (Tourangeau et al. (2000, p.235)). Heaping introduces bias into survey reports for tworeasons: (1) rounding values are not evenly spaced, because the distances between successiveround values increase as the number gets larger, which has the consequence that more values willbe rounded down than rounded up. (2) Respondents may not round fairly but instead follow abiased rounding rule, which means they often do not necessarily round to the nearest number,but instead characteristically round up or down13.

5.8 Additional Content and Visual Hints

Christian & Dillman (2004) added an arrow to direct respondents toward answering a subordi-nate openended question, where detailed information should be provided. The researchers tookcare that the arrow was placed inside the foveal view14. This arrow significantly increased thepercentage of eligible respondents answering the subordinate question (unfortunately the samepercentage as ineligible mentions).

5.8.1 Placement of Instructions and Help Texts

Respondents might be more willing to request a definition or explanation from a Web basedresource than from a human interviewer, which makes this section important for Web surveys.

12Other scales: discrimination, racial policies, race-associated policies13For a more detailed description of these effects, see Tourangeau et al. (2000, p.238f)14See section 10.3.1 for further information

35


It was also often observed that when respondents obtain clarification for possibly ambiguousquestions, response accuracy improved dramatically.

Christian & Dillman (2004) varied the location of instructions for yes/no questions by placingthem before and after the response categories. The instructions contained information on howto skip the question if a certain condition was not fulfilled. More people skipped the question(26.2% versus 4.8%) when the instructions where located before the response options becausethey were more likely to see and thus read the instructions before answering. Additionally, itintroduced confusion when the instruction was placed after the alternatives, because some re-spondents appear to have used the instructions for the following question as well, because alsofor the successor question, item nonresponse increased.

Two experiments within this field were conducted by Conrad et al. (2006): a questionnaire aboutlifestyle with rather complicated phrases (food nutrition concepts) was used including helpfuldefinitions. Three ways to retrieve the definitions were provided: (1) one-click interface, whererespondents needed to click on a phrase; (2) two-click interface, where an initial click displayeda list of terms from which respondents could select one by clicking; (3) click-and-scroll interface,where clicking on a hyperlinked term displayed a glossary (an alphabetic list) of all definitions.Only the first few were initially visible, scrolling was necessary to see the others. In general,respondents rarely requested clarification (13.8% of those who answered the experimental ques-tions) and the number of requests is sensitive to the amount of effort (here: number of clicks).Due to these findings, another experiment was carried out. By simply moving the mouse toa certain term the definition hovered above the text. This resulted in an elevation to 22.4%requesting definitions (Conrad et al. (2006, p.259)). While it is clear that more respondentsrequest more definitions with rollovers than by clicking, it could be that many of the rolloverrequests were less deliberate than requests registered by clicking.

In Conrad et al. (2007), additional help texts were provided by clicking on a highlighted text,which was meant to improve the comprehension of questions. As a result, respondents whocould obtain clarification provided substantially more accurate answers than those unable toobtain clarification. In particular, respondents reliably provided more accurate answers whenthey could click for clarification than when no clarification was available. The overall goal wasto bring features of human dialogue to Web surveys. A more extreme possibility of bringinghuman factors to self-administered Web surveys is to employ interviewer avatars15.

Galesic et al. (2008) conducted an eye-tracking experiment which tried to find out under whichconditions definitions of certain items regarding the questions are read or not. As a result theeye-tracking data revealed that respondents were reluctant to invest any effort in reading def-initions of survey concepts that were only a mouse click away, which suggests, that definitionsand help texts should be located next to the corresponding questions or question alternatives.

15See chapters 7 and 5.6.1 for more information

36


5.9 Date Formats

Christian et al. (2007) and Smyth et al. (2004) conducted experiments concerned with the influ-ence of given date formats on the response behavior. A version with equal sized month and yearanswer spaces was compared to one where the month space was about half the size of the yearspace. In addition the effects of using word labels versus symbols (like MM YYYY ) to indicatethe number of digits respondents should use when answering were checked.

As a result, when the month box was about half the size of the year box, respondents weresignificantly more likely to report the date in the desired format. While reducing the size of themonth box did not significantly impact how respondents reported the month, it did significantlyincrease the likelihood that respondents reported the year using four digits. The use of thesymbols M and Y for month and year had the effect that the number of symbols determinedthe number of digits entered by the respondents. Furthermore a slight improvement in givingthe correct format was given when the two input fields were connected to each other (no spacebetween the fields). Locating the symbols MMMM below answer spaces as opposed to locatingthem to the right of the answer spaces resulted in a significantly larger proportion of respondentsusing a correct four-digit format to report the year. This shows that explanation texts shouldbe located as close as possible to the described input fields.

5.10 Progress Bar

A progress bar is an instrument which should show the respondent the actual position withinthe questionnaire (e.g. how many questions or pages remain or how many percent of the wholequestionnaire has already been filled out). Generally speaking, a progress bar should motivatethe respondents to finish the survey and not quit a few questions before reaching the end andshould therefore be an attempt to increase response rates. A progress bar is also called progressindicator or point of completion (POC) indicator. A progress bar is a feature that was first intro-duced for Web surveys, because, as Conrad et al. (2003) outline, they have not been necessaryfor questionnaires before, because “paper questionnaires inherently communicate informationabout respondents’ progress: the thickness of the yet-to-be-completed part of the booklet pro-vides immediate and tangible feedback to the respondent about how much work remains”. Whenbranching and additional questioning dependent on previous questions answered is heavily used,calculating the current progress is somehow difficult. When displayed to the respondent, thisvalue can lead to confusion.

The most interesting study running an experiment with the influence of progress bars on dropoutis reported in Heerwegh & Loosveldt (2006a). One group of respondents had a progress bar dis-played and the other group did not. As a result, the group with no progress bar had a higher,but not statistically significant dropout. When a progress bar was present, item nonresponse(here: proportion of unanswered questions) was lower. The necessity to repeat the study with apopulation of non-students is mentioned in the discussion part of the paper. Furthermore, simi-lar results were reported by supersurvey (Hamilton (2004)), in an experiment which emphasizedthe positive effect of a progress bar on completion rates.

37


Contrary to these findings, in Crawford et al. (2001, p.156) the progress bar appeared to havea negative effect. “Among those who started the survey, 74.4% completed it when no progressindicator was present compared with 68.5% of those who received the progress indicator version”.In Crawford et al. (2005, p.49) an additional attempt to give an explanation is given: “Whenrespondents see a progress indicator, we believe they extrapolate the time they have taken thusso far in the survey and decide on how long the survey will take overall. In a survey that beginswith burdensome questions that may take longer to answer than average, such an interpreta-tion may result in an evaluation of burden that is too high”. Heerwegh (2004b) conducted anexperiment with 2520 university students and found no significant effect of a graphical progressbar16 on dropout. Healey et al. (2005) report of experiments which provide one group of respon-dents with a progress bar and the other group not. None of the differences were statisticallysignificant, so no support for the efficacy of the progress bar was proven. The same is true forthe results of a similar experiment reported in Couper, Traugott & Lamias (2004, p.370f) andCouper et al. (2001) where differences did also not reach statistical significance. Interestingly,the average time to complete the survey with the progress indicator took significantly longerthan without (22.7 compared to 19.8 minutes). Two explanations were given by the author: (1)download times increased because additional resources had to be downloaded for the progressbar; (2) Respondents who received the progress indicator took more care over their answers.

In Conrad et al. (2005), the progress of a (textual) progress indicator was modified, so that threeversions were presented to the respondents, (1) progress following a linear function, which meanscurrent page position is divided by total number of questions; (2) fast-to-slow : this was achievedby dividing the log of the current page by the log of the final page and (3) slow-to-fast : this wasachieved by dividing the inverse log of the current page by that of the final page. As a result,break off rates varied with the speed of the progress indicator. Respondents were more likelyto break-off under slow-to-fast feedback (21.8%) than under fast-to slow (11.3%). This alsoaffected respondents’ judgement of the duration of the task, which was asked at the end of thequestionnaire. Fast-to-slow -respondents estimated that it took fewer minutes to complete thanrespondents in the other groups. An extended experiment varied the frequency of how often theprogress bar was displayed: (1) always-on, as in the previous experiment; (2) intermittent : atnine transaction points in the questionnaire; (3) on demand : displayed when respondents clickeda link labelled show progress. Additionally to this, the speed of progress was varied as in theprevious experiment. Concerning the impact the speed of progress had, the results replicatedwhat was observed in the previous experiment, but the frequency of progress feedback did notreliably affect response rates.

In general it can be stated that one should look at the properties of the questionnaire before de-ciding whether to employ a progress bar or not, e.g. the overall length of the questionnaire couldbe crucial for this decision: “In case of extensive surveys containing a large number of questions,it seems to be recommended not to inform the respondent about his individual progress becauseit may be particularly de-motivating if the progress indicator indicates no or only marginalprogress. Therefore, it may be recommended to add a progress-bar to short surveys but to re-move it in longer ones because otherwise dropout rates are likely to increase” (Jackob & Zerback(2006, p.12)).

16Grey bar with blue as progress color, percentage completed was written below

38


5.11 Sponsor’s Logo

On some questionnaires, the logo of the sponsor or the operator (like a research institute, univer-sity or company) was displayed. The effects of these logos are discussed in this section. Walstonet al. (2006) observed the effects of governmental and non-governmental sponsoring and foundthat sponsorship had an effect (at least when combined with a fancy design) when the surveywas marked as governmentally sponsored. Moreover Heerwegh & Loosveldt (2006a) observedan effect on dropout when a sponsor’s logo (the logo of the university) was displayed, since thelogo led to a (statistically non significant) brake-off rate reduction. As a possible reason theauthority principle was mentioned, since the respondents may have held the organization of thelogo in high esteem. Heerwegh (2004b)’s experiment found that there was no effect when theuniversity logo was displayed on a university student questionnaire on the completion rate.

It is hard to make general statements about the influence of a sponsor’s logo because the contextis very important in this case. “If the respondents hold the organization in high esteem, thenrepeating the logo on each survey screen could decrease break-off rates. Conversely, if therespondents do not hold the organization in high esteem, or even have serious doubts aboutits legitimacy, then a logo might produce the opposite effect” (Heerwegh & Loosveldt (2006a,p.196)).

39

6 Non-Visual Design Experiments in OnlineSurveys

Of course other experiments not dealing with visual design effects have been reported in recentpublications. Some of these results are discussed in the following sections. The influence of thehigh hurdle technique, confidence and security (e.g. offering https as protocol) are topics notdealt with in this chapter but present additional possibilities for running experiments with Websurveys.

6.1 Incentives

Deutskens et al. (2004) offered respondents different incentives (depending on the length of thequestionnaire assigned) in their study: (1) a voucher with e 2 and 5 for an online book and CDstore; (2) a donation of a maximum of e 500, whereby respondents could choose between WWF,Amnesty International, or a cancer association and (3) a lottery, respondents had the chance ofwinning one of 5 vouchers of e 25 and e 50, respectively. As a result, vouchers and lotterieshad a higher response rate with 22.8% compared to the donation to a charity group with 16.6%(where 61% chose the cancer association, 25% the WWF and 15% Amnesty International).

Bosnjak & Tuten (2003) conducted experiments with prepaid and promised monetary incentives.The positive effects of prepaid monetary incentives in mail surveys are well known. Due to newtechnological services (e.g. paypal), which enables money to be transferred to people online inadvance, this incentive-mode is now also possible for Web surveys. The results indicate thatprepaid incentives ($ 2) in Web surveys seem to have no advantages concerning the willingnessto participate, actual completion rates, and the share of incomplete response patterns whencompared with post-paid incentives. Furthermore, post-paid incentives show no advantages incomparison to when no incentives are given. Finally, compared to no incentives, participationin prize draws ($ 50 and four $ 24 prizes) increased completion rates and also reduced variousincomplete participation patterns.

Göritz (2006a) took a closer look at the use of cash lotteries as incentives, whereby variousexperiments with different forms of cash lotteries were conducted. As an overall result, cashlotteries compared to no incentives did not reliably increase response or retention; neither didit make a significant difference if one large prize or multiple smaller prizes were raffled. In themaster thesis of Rager (2001), a survey was announced on a Web page as either a questionnaireor as lottery. Those who got the lottery as an announcement had a significantly higher rate ofcompletion (58.1% compared to 27.2%).

40

6 Non-Visual Design Experiments in Online Surveys

Birnholtz et al. (2004) carried out an experiment in fall 2002 with three different incentive modes:(1) a $ 5 bill sent with the survey instructions via first class mail; (2) a $ 5 gift certificate codeto amazon.com sent with the survey instructions via first-class mail, or (3) a $ 5 gift certificatecode to amazon.com sent with the survey instructions via e-mail. Results show that $ 5 billsled to significantly higher response rates than either of the gift certificates (57% for cash vs.36%1). This finding was statistically significant (p<.01) and suggests that cash is a superiorincentive for an online survey, even with technologically sophisticated respondents. This maybe due to the perceived limitations, delayed payoff, or reduced visibility of online gift certificates.

In summary, as noted in Bosnjak & Tuten (2003, p.216), the success with prize draws and cashlotteries will possibly depend on cultural factors, which could explain the different findings inGöritz (2006a, p.216) and Bosnjak & Tuten (2003).

6.2 Invitation and First Contact

Quintano et al. (2006) compare invitation modes to a Web survey, either via telephone or viae-mail. The study found that willingness to be a respondent increased when the initial contactwas made by telephone. Whitcomb & Porter (2004) tested complex graphical e-mail designsand their affect on survey response. Respondents were contacted with one of six e-mail designsthat varied in format (text vs. HTML), color of background (white vs. black), and graphicaldesign (simple vs. complex). As a result, when using a black background, response rates werelower (9.2%, compared to 12.6% for white as background color). In regard to the e-mail format,participants who were sent the HTML e-mail with a white background and a simple header weremore likely to respond to the survey than participants mailed the bare-bones text message, witha difference in response rates of 3.6%. When using complex graphical headers in the HTMLformat, response rates were also lower than with a simple header (9.9% compared to 11.9%).

Heerwegh et al. (2004) conducted experiments on personalization of e-mail invitation letters.The first experiment with the topic attitudes towards marriage and divorce had two conditions,(1) no personalization as control condition, where respondents where addressed as Dear Studentand (2) personally addressed, e.g. Dear John Smith in the e-mail message itself. As a result, theresponse rates differed significantly (49.1% for the no personalization condition versus 57.7%).No personalization effects on social desirability bias could be found. The second study hadthe same study design and reaffirmed the positive effects of personalization on the responserate in Web surveys. The effect of personalization reached statistical significance in this study:The average score on the debriefing question to which degree did you feel at ease to honestlyand sincerely respond to the questions? was significantly higher in the impersonal salutationgroup than in the other group, denoting the personalization group to be less at ease to honestlyreport opinions, behavior or facts (average scores on a 1 to 5 point scale of 4.13 versus 4.01).Furthermore, in sex-related questions, responses differed between these two experimental groups:when a personal salutation was used in the e-mail, the number of sexual partners reportedincreased2. To sum it up, the effect of personalization was so pronounced (an increase of 8.6percentage points) that it seems worthwhile to consider personalizing e-mail contacts whenever

1Having 40% paper- and 32% e-mail-invitation2Only respondents who had ever been in a sexual relationship were included in the analysis

41


possible. Heerwegh & Loosveldt (2006b) also tested the hypothesis that personalization wouldinduce a social desirability bias. Secondly, they carried out an additional test of the effect ofpersonalization on Web survey response rates with the result, that personalization significantlyincreased the response rate.

6.3 Different Welcome Screens

This section discusses the influence of the first visible page to the respondent, which shouldcontain introductive information for the whole survey. Healey et al. (2005, p.6) carried out anexperiment with 837 respondents and analyzed which influence the visibility of the first questionon the screen did or did not have. The outcome was that having the full question visible hadlittle or no effect on whether or not respondents decided to complete the question or continuewith the survey. Another recommendation comes from Dillman et al. (1998): “Begin the Webquestionnaire with a question that is fully visible on the first screen of the questionnaire, andwill be easily comprehended and answered by all respondents”.

6.4 Length of the Survey

Deutskens et al. (2004) varied survey length in an experiment, where the long version tookabout 30-45 minutes, the short version about 15 to 30 minutes to finish. As expected, the shortversion of the questionnaire had a significantly higher response rate with 24.4% compared to17.1%. The analysis of the number of don’t knows in the long and in the short version revealedthat there were proportionally more don’t know answers in the longer version (statistically sig-nificant with p<0.05). There were also more semi-completed questionnaires in the longer version.

Ganassali (2008) carried out experiments with the length of the questionnaire in combinationwith interaction effects (repetition of previous answers and some types of forced answering),where a short one had 20 and a long one had 42 questions with interesting results. The lengthwas not mentioned prior to completion, only on the first screen via page number indicator. Inter-estingly, the longer questionnaire obtained longer textual responses in openended questions (78words compared to approximately 60 words, which is 25% less). Again surprisingly, respondentswho got the longer survey had significantly higher respondent satisfaction than in the shortquestionnaire (with a score of 6.75 versus 6.10) (Ganassali (2008, p.28f)). One would have totake a look at the questions to see if there are any side effects.

6.5 Time to Complete-Statement at the Beginning

Walston et al. (2006) found that a shorter time-to-complete statement positively affected thedecision to begin the survey (14.4% for 5 minutes vs. 10.8% for 15 minutes), as well as ageneral trend for the response rate. Similar results were found by Crawford et al. (2001, p.153):those who were informed that the survey would take 8 to 10 minutes to complete had a lowernonresponse than those who were told it would take 20 minutes (63.4% vs. 67.5%), but the 20-minute group had a lower rate of breakoff once they started the survey. Similar outcomes werereported in Heerwegh (2004b), where a vague length statement (as short as possible) produced

42


a significantly higher login rate (66.5%) than the more specific length statement (approximately20 to 25 minutes) (62.8%). The vague length statement did not produce higher break-off ratesthan the specific length statement.

43

7 Outlook

Couper (2005, p.487) identifies 5 general technology related trends in survey research:

1. The move from interviewer-administered to self-administered surveys: an interesting newapproach is IVR: IVR is “the acronym for Interactive Voice Response, which is a datacollection technology in which the computer plays a recording of the question to the re-spondent over the telephone and the respondent indicates the response by pressing theappropriate keys on his or her telephone keypad” (Steiger & Conroy (2008))1.

2. The move from verbal (written or spoken) inputs and outputs to visual and haptic and/orsensorimotor inputs and outputs. Concerning auditory communication, apart from ad-vantages such as cost reduction, the capture of verbal inputs could allow the analysis notonly of the selected response to a particular question but could assist in the analysis ofthe certainty with which the respondent holds that view, based on the verbal qualifiersused in responding, or even extracted from other nonverbal qualities of the vocal response(Couper (2005, p.489)).

Applications which employ audio-visual (multimedia) communication are video-CASI, orthe use of videos as stimulus material. Mayntz et al. (1978, p.114ff) describes an interviewas a social situation where interviewing turns out to be a special form of social interaction.It is not clear if interviewer avatars can easily substitute this situation and which possibleside effects could result. Some research on this has been reported in Fuchs (2008) andGerich (2008). Adding multimedia to Web surveys makes it more attractive for respondentsto complete the survey, but nevertheless the use of multimedia content needs additionalsteps to avoid excluding e.g. visually handicapped persons2.

Since the early days of Web surveying it was recognized that computer mediated surveyingcould potentially enrich studies with multimedia stimuli such as graphics, pictures, spokenword or other sounds. But in fact, these possibilities have only seldom been put into action.In recent years Web surveys were enriched by graphics and pictures - some of which werecontent bearing. Methodological evaluations have shown that these pictures can have aserious impact on the perceived question meaning and thus on the responses provided. Anevaluation of this technology on unit non-response, social desirability and social presenceis given by Fuchs & Funke (2007).The use of images as visual communication is currentlymaking it’s way into mainstream survey research, because the use of full-color images andphotographs is a technically trivial and relatively inexpensive undertaking3.

Another new form of computer assisted data collection is the use of touch screen terminalsas reported in Weichbold (2003) and Weichbold (2005). A touch screen is a display whichcan detect the location of touches within the display area, in the case of surveys usuallyperformed with the human hand.

1Additional information on IVR can be found in de Leeuw (2008b, p.255ff)2See more on this in Zunicá & Clemente (2007) and Caldwell et al. (2008)3For research results in this field, see section 5.6

44

7 Outlook

3. The move from fixed to mobile information and communication technology, for data col-lectors and for respondents (mobile phone). Small devices like Personal Digital Assistants(PDA) were used by interviewers e.g. for household screening in several large-scale surveysin the United States.

4. The move from discrete surveys to continuous measurement. Because of new technologicalpossibilities, diary surveys can be conducted more easily.

5. The move from data only, to data and metadata, and also to paradata4.

7.1 Dynamic Forms, AJAX, WEB 2.0

In the following section, a description of new upcoming technology trends in survey research isgiven:

Dynamic forms is the generic heading for dynamic text fields and dynamic lists, two innovativeways of reactive data collection in self-administered online surveys. These Web 2.0 techniquesare described in Funke & Reips (2007b), where it is shown how to combine the advantages ofopenended and closedended question formats. When using dynamic text fields, after beginningwith an entry, suggestions for the most probable word are offered in an area below the field.With each new letter these suggestions are re-adapted. By using dynamic lists, even questionswith large numbers of response categories can be brought into a hierarchical order and can beanswered like closedended questions. At first, the respondent sees only a single table with verygeneral categories. As soon as one of these categories is selected, more specific choices appearin a second table. The underlying technology which enables this functionality, is called AJAX 5.It is always mentioned within the field of Web 2.0. These two methods have not been examinedin survey research yet. It would be interesting to start experiments regarding their influence ondata quality or the cognitive processes underlying response behavior.

4Concrete examples of paradata collection can be found in chapter 145Asynchronous JavaScript and XML

45

Part II

Theoretical Background

46

8 Introduction

Of course it is important to embed the experiments in the general theory of survey research.The methodological as well as a psychological theory will be given together with terminologicaldefinitions such as reliability and validity.

The main methodological concept is the total survey error. Even though a general overviewof all kinds of errors which can occur when conducting a survey should be given, the focus isset on those errors which can occur when running Web surveys in general and those which areimportant for the experiments in particular. Reducing these errors increases survey quality. Themost important errors for this work are:

• Nonresponse at the unit level (see chapter 9.1.1.4): when running Web surveys, this erroris often caused by technical problems. In the experiments some advanced technologies areused, and it should be checked if these increased this particular error.

• Dropout (see chapter 9.1.2.2): one of the main questions is how different styles of questionsinfluence dropout. This is evaluated in chapter 18 and is closely related to respondent’sburden (see chapter 9.1.2.3).

• Measurement Error (see chapter 9.1.3): This error is central for the experiments andconcerns evaluations given in chapters 19 and 20.

Differences between survey modes (mainly differences between paper and pencil and online) arealso discussed. Because a lot of findings already made for offline questionnaires should be alsoapplied on Web surveys, it is important to identify possible mode effects.

Additionally, some psychological theories are given, namely the individual steps involved in theresponse process, and some visual interpretive heuristics together with gestalt principles, whichserve as a good basis for visually designing (Web) questionnaires.

Finally, the whole research topic is embedded in online research. Therefore, this topic is describedin more detail. Additionally, all major institutions dealing with online research, are mentioned.

47

9 Methodological Theories

This chapter should summarize methodological theories relevant for conducting surveys in gen-eral and Web surveys in particular and how to reach high survey quality by reducing surveyerror. All sources of errors which can occur in Web surveys are discussed in the subsequentchapter with a focus on possible problems when carrying out surveys on the Web.

9.1 Total Survey Error

The total survey error covers all errors which bias the result of surveys and is systematically splitup into several specific error types. The experiments carried out for this dissertation should helpto diminish particular errors. Sample surveys are subject to four major sources of error, andeach must be taken into consideration in order to receive valid and reliable sample estimates.A good overview of all survey errors is given by Weisberg (2005). The book total survey errorapproach defined the total survey error as follows: “The total survey error approach is basedon analyzing the several different sources of error in surveys and considering how to minimizethem in the context of such practical constraints as available money” (Weisberg (2005, p. 16)).Subsequently, all errors are described and their minimization briefly discussed but the focus willbe set on those important for the experiments in this thesis. An error in this case is a mistakemade within the whole surveying process. It refers to the difference between an obtained valueand the true value for the larger population of interest.

Biemer & Lyberg (2003, p.34ff) give another definition of the total survey error: “the quality ofan estimator of a population parameter is a function of the total survey error, which includescomponents of error that arise solely as a result of drawing a sample rather than conductinga complete census called sampling error components, as well as other components that are re-lated to the data collection and processing procedures called nonsampling error components”.Sampling errors result from selecting a sample instead of the entire population and nonsam-pling errors are the results of system deficiencies. According to Biemer & Lyberg (2003, p.38ff),nonsampling-errors can be split up into 5 major sources and potential causes: specification-,frame-, nonresponse-, measurement- and processingerror. All of the above will be described inmore detail in the following chapters together with error reduction strategies. A particular focuswill be placed on the errors, which play a major role for design effects in online surveys.

Weisberg (2005, p.19ff) gives a slightly different categorization of these error types, whereby thesource of the error is foregrounded1:

• Respondent selection issues

1The relevant ones will be discussed in the following chapters; those not relevant are described directly withinthe following enumeration

48


– Sampling error– Coverage error– Nonresponse error at the unit level

• Response accuracy issues

– Nonresponse error at the item level– Measurement error due to respondents: this comes up when respondents do not give

the answers they should, according to the researcher’s intentions. This can be theresult of wording problems or questionnaire issues such as question order effects orquestion context effects.

– Measurement error due to interviewers: all experiments carried out and describeddeal with self-administered surveys, so no further description is given here2.

• Survey administration issues

– Postsurvey error: this error occurs after the interviews are conducted (e.g. within thedata editing stage). For Web surveys, the potential of creating this kind of error issmaller because the data matrix is generated automatically. Therefore no manuallyentered data errors (e.g. coding- or data-entry-errors) or automated reading errorsusing e.g. OCR3 can occur4. In the terminology of Biemer & Lyberg (2003), thiserror is called processing error.

– Mode effects: see section 9.4– Comparability effects: “comparability issues arise when different surveys are com-

pared. Surveys taken by different organizations, in different countries, and/or atdifferent times are often compared, but the differences that are found are not neces-sarily meaningful” (Weisberg (2005, chap.13))5.

In other classifications, item- and unit-nonresponse are more closely linked, but here they fallinto different superior groups. “Nonresponse can occur at two levels, the unit, by which we meana person or household (though it can also be an institution such as a business or school); and theitem, which is an individual question in our questionnaire” (Czaja & Blair (1996, p.181f)). Thisclear separation makes even more sense for Web based surveys because the reasons for unit- anditem- nonresponse differ even more in this mode. Nonresponse (particularly item-nonresponse)plays a major role for this thesis, when the effect of different designs on the dropout-rate is partof the analysis.

9.1.1 Respondent Selection Issues

9.1.1.1 Probability and Non-Probability Web Surveys

Before discussing respondent selection issues, the difference between probability and non-probabilityWeb surveys must be discussed. The latter group has no sampling which has several consequencesfor the generated error, because in most cases the self selection error is also added. From the

2For further information on the interviewing style debate, see Weisberg (2005, chap.4)3Optical Character Recognition4See Weisberg (2005, chap.11) for further information5For a more detailed discussion of equivalence limits, see Weisberg (2005, chap.13)

49


aspect of sample selection, there are several types of probability and non-probability Web sur-veys (this enumeration and description is taken from Manfreda & Vehovar (2008, p.265), withslight modifications):

Probability Web Surveys (often perceived as scientific surveys) are performed on a probabilitysample if units that are obtained from a sampling frame satisfactorily cover the target population.There are several types of probability surveys:

1. List-based surveys of high-coverage populations: these lists consist of samples ofe.g. students, members of organizations etcetera, whereby all have access to the Web andwhere a sampling frame with satisfactory contact information is available.

2. Surveys on probability pre-recruited lists or panels of internet users: this sampleof internet user is pre-recruited with a sampling method like telephone surveys on a randomsample of households or using random-digit-dialing.

3. Surveys on probability panels of the general population: in this case, not onlypre-recruitment was done, but also hardware and software equipment needed for the par-ticipation in several Web surveys was provided.

4. Web Surveys as an alternate option in mixed-mode surveys: a probability sampleof respondents can give the opportunity to choose a Web questionnaire among the availablesurvey modes, or the researcher has the possibility to allocate a part of the sample to theWeb mode6.

Intercept surveys should be put somewhere in between: systematic sampling is used to interceptvisitors of a particular Web site. Respondents are supposed to be representative of visitors tothat site, who constitute the target population. According to Manfreda & Vehovar (2008, p.265),this modification is based on two reasons: (1) it is true that, when taking a look at the log-files,a list of all visitors can be generated. The problem is that these visitors cannot be uniquelyassigned to a person, because it is likely that one person stands for several visitors (even cookiescannot help out here). (2) Visitor’s who do not visit the page while the questionnaire is online,are lost.

Nonprobability Web surveys (often perceived as non scientific surveys) do not have a probabil-ity sample of units obtained from a sampling frame covering the target population satisfactorily.In some cases (e.g. volunteer opt-in panels) probability sampling may be used, however thesampling frame is not representative of the target population. There are several types of suchWeb surveys:

1. Web surveys using volunteer opt-in panels (also called access panels): some con-trolled selection of units from lists of panel participants is used for a particular surveyproject. These lists are basically a large database of volunteer respondents. Opt-in heremeans self inclusion. The problem is (at least for some surveys) that the internet is notstructured in a way that allows researchers to construct well defined sampling frames,which would be a complete list of internet users that can be used to draw probability sam-ples with known characteristics. One attempt to solve these problems is the employment

6For possible mode effects as a drawback of this method, see section 9.4

50


of internet based access panels. One problem with panels is that they consist of volun-teers, and it is impossible to determine how well these volunteers represent the generalpopulation7.

2. Web surveys using purchased lists: these lists typically consist of e-mail addressespurchased by a commercial provider, usually obtained either by specific computer programssearching for e-mail addresses on Web sites or by participant’s self inclusion. Usually, theyhave neither access restrictions nor control over multiple completions.

3. Unrestricted self-selected Web surveys: open invitations on different Web sites, butalso in online discussion groups and traditional media.

4. Online polls are similar to the group above and are more for entertainment purposes andas forums for opinion exchange, like public polls or question of the day polls.

For categories 1, 3 and 4, self selection error can cause troubles on Web surveys where no accesslimitations are given. Because of the effect of these volunteer respondents, “under these condi-tions, inference from survey respondents to any larger population through inferential statisticsis scientifically unjustified” (Dillman & Bowker (2001, p.3)). The reasons for selecting this re-cruitment mode are in most cases cost considerations. Self-selection is a form of nonprobabilitysampling and should be used with caution.

Faas & Schoen (2006) examined the effects of self-selection by running an experiment based on acomparison of online and offline surveys conducted in the context of the German federal election20028. One of the online versions used self-selection as the recruitment mode. For the otherinterviews, an online access panel was used and offline face-to-face interviews were conducted.As is often the case when self-selection is used, those more interested and possibly involved ina topic were overrepresented. “If respondents are recruited using procedures that permits self-selection, results can be expected to be biased both in terms of marginal distributions and interms of associations among variables” (Faas & Schoen (2006, p.179f)). One reason for this isthat “advertisements for online surveys are not distributed equally among Web sites; rather itdepends upon the subject of the survey on which Web site ads will be placed: information aboutpolitical surveys will be found more frequently on Web sites with political content than on sportssites”. This effect is verified with the empirical findings: “the self-selected respondents are highlyinterested in politics, highly interested in the campaign, more polarized, are even more certain tovote and - most strikingly - almost one in four of them is a party member” (Faas & Schoen (2006,p.179f)). It is also possible that those people who are strongly connected to a certain party, suchas the party members, fill out the questionnaire multiple times to influence the results in theirfavour. It is true that in this case, self selection does not lead to the desired results, but for othertypes of surveys, nevertheless this mode can make sense (e.g. when users of a certain Web pageare the target population). Another finding reported by Faas & Schoen (2006, p.839) shoulddemonstrate the dangers associated with self selection: the marginal distributions of those whoparticipated in the unrestricted online survey show that these people are substantially younger(33 years on average), better educated (76% have university entrance diplomas) and more oftenmale than female (78% were male). This follows the actual figures of internet users.

7See de Leeuw (2008b, p.250ff) for further information8Which means a topic where most of the respondents will have a relatively clear opinion, if interested in thetopic

51


As a common strategy, when marginal distributions differ between two samples, weighting isapplied. Faas & Schoen (2006, p.187) had empirically grounded doubts concerning the con-ducted study: “As with distortions of marginal distributions, bias in associations is not reducedsubstantially when data are weighted socio-demographically”. Similar thoughts are mentionedin Dever et al. (2008, p.57). Loosveldt & Sonck (2008, p.93ff) carried out an evaluation of theweighting procedures for an online access panel survey. They come to the result that “weightingadjustment had only a minor impact on the results and did not eliminate the differences”.

9.1.1.2 Sampling Error

The first concrete error discussed here is the sampling error. Subsequently, a few differentdefinitions are given: “Sampling error arises from the fact that not all members of the framepopulation are measured. If the selection process was repeated, a slightly different set of samplepersons would be obtained” (Couper (2000, p.467)). Or in other words: sampling error “is theresult of surveying a sample of the population rather than the entire population” (Dillman &Bowker (2001, p.2)). In contrast to the coverage error, here every person in the frame populationhas a nonzero chance of being part of the sample. Representativeness is an important phrase inthis context: “Even when interviewing a sample, the survey researcher normally wishes to be ableto generalize beyond the people who were sampled, sometimes even to people in other locationsand at other time points. That is possible only when the sample is representative of the largerpopulation of interest” (Weisberg (2005, p.225)). Weisberg (2005, p.231ff) gives descriptionsof different sampling techniques. In general a distinction can be made between probabilitysampling, where there is a known chance for every element in the sampling frame to be selectedfor the sample, and the more problematic sampling, where the chance for being selected is notknown, which causes potential biases and the sampling error cannot be estimated9.

9.1.1.3 Coverage Error

Coverage error can be defined as a function of the mismatch between the target population andthe frame population, which are the actual entities from the target population with a positiveprobability of inclusion in the survey. In other words, coverage error is “the result of all unitsin a defined population not having a known nonzero probability of being included in the sampledrawn to represent the population” (Dillman & Bowker (2001, p.2)).

Coverage error is mainly associated with the sampling frame, the actual set of units from whichthe sample will be taken: “The sampling frame in a survey is the list from which the sample isselected. Frame error is the error that can arise when the elements in the sampling frame do notcorrespond correctly to the target population to which the researcher wants to make inferences”(Weisberg (2005, p.205f)). Weisberg (2005, p.205) sees coverage error as the most importantframe error and defines it as the mathematical difference between a statistic calculated for thepopulation studied and the same statistic for the target population. Weisberg (2005, p.205) givesanother good example: “A simple example of a coverage problem involves sampling from phonebooks, because they exclude people with unlisted numbers who are certainly part of the targetpopulation”. When applying this error to Web surveys, the problem that arises is that thereare people who do not have access to the internet (or e.g. for some special surveys when theydo not have an e-mail address to receive an invitation letter) and are automatically excluded.

9Strategies to reduce sampling error can be found in Dillman (2007, chap.5)

52


Similarly Couper (2000, p.467) states that: “Coverage error is presently the biggest threat toinference from Web surveys, at least to groups beyond those defined by access to or use of theWeb”. This statement was even more true when it was published in 2000 than it is now, becausethis effect has decreased since more people have widespread access to the Web10. This is stillpresent and will not vanish completely in the future. But additionally, even if everyone who ispart of the target population would have internet access, the difficulties of constructing a frameto select a probability sample of such persons are daunting (Couper (2000, p.467)). There arealso demographic differences between those who have access to the internet compared to thosewho do not (e.g. concerning income, residence and education): “Handling the coverage problemby weighting underrepresented groups would assume that rural low-income people with internetaccess are like rural low-income people without that access, which would be a risky assumption”(Weisberg (2005, p.213)).

Demographic Differences The demographic differences between people who have access to theinternet and those who don’t is well documented. For example Couper, & Coutts (2004, p.221)give some figures for Germany, Switzerland and Austria concerning this problem and list thevariables with problematic differences. These are enriched with actual figures for Austria11, asfar as available:

• Age: 2004 in Germany 85% people within the age of 14-24 lived in a household withinternet access, compared to 32% those older than 54. In Switzerland 78% of those aged20-29 used the internet regularly compared to 31% of those older than 49. In Austria thepercentage varied for those who used the internet during the last 6 months between 81%(aged 16-24) and 10% (aged 65-74). When looking at the figures for 2008, both groupshave increasing percentages: 91.8% (aged 16-24) and 25.5% (aged 65-74).

• Gender: in Germany in the first quarter of 2004, 63% men and 53% women used theinternet. In Switzerland as well as in Austria the portion of male internet users is evenhigher (63% compared to 46 % in Switzerland, 60% compared to 48% in Austria12. In2008, there was still a gap in Austria: we have 77.2% male users compared to 65.3% femaleusers. The question was if they used the internet within the last 3 months. The gap iseven more extreme for those aged 55-74 years (50.3% versus 29.2%) but almost no gapexists for those younger than 35 years.

• Education: in Switzerland in 2004, 81% of those with a university degree use the internetseveral times a week, compared to 54% of those who finished an apprenticeship or profes-sional school. In Austria 82% of those with a university degree used the internet duringthe last year, compared to those who have completed compulsory schooling with a portionof 24% and 21% of those without compulsory schooling. In Austria in 2008, 95% of highereducated people have used the internet in the last three months compared to those withsecondary education with 46.2%.

• Income: In Germany, 87% of households with an income higher than 2600 e or more hadaccess to the internet in 2004. The portion of households with an income less than 1300e is only 34%. In Switzerland, 81% of people with an income higher than SFR 10.000 or

10For concrete figures see the demographic differences paragraph below11Source: Statistik Austria at http://www.statistik.at/web_de/statistiken/informationsgesellschaft/

ikt-einsatz_in_haushalten12For Austria, these are figures from 2003, people were asked if they had used the internet during the last 12

months

53

http://www.statistik.at/web_de/statistiken/informationsgesellschaft/ikt-einsatz_in_haushalten

http://www.statistik.at/web_de/statistiken/informationsgesellschaft/ikt-einsatz_in_haushalten


more use the internet several times a week. For people with an income equal or lower thanSFR 4000, the portion is 25%. Unfortunately, no figures concerning income are availablefor Austria.

Additional figures concerning Germany can be taken from Ehling (2003), e.g. for what purposethe internet is used and why some households do not have internet access, as well as from Schef-fler (2003), where the percentage of internet users for all European countries for 2003 is given,and from Wolfgang Bandilla (2003), where e.g. European countries are compared. Actual anddetailed figures for Germany can be taken from the (N)Onliner Atlas13.

As a concrete example for the effects of different demographic distributions, Bandilla & Bosnjak(2003) compared a traditional written survey with a Web based survey with a CATI 14 pre-recruited panel of internet users in a mixed-mode study. When taking a look at the distributionof the respondents, there are disproportionately more males (66.1% compared to 48.2% for thepaper questionnaire) and on average they are younger and better educated than respondentsfrom the general population sample, which makes the comparability of the two results question-able. As already mentioned for nonprobability Web surveys, weighting possibly cannot solve thisproblem: “Taking the characteristics gender, age and education into account, the adjustmentproduces weighting factors for several online respondents exceeding the factor 5” (Bandilla &Bosnjak (2003, p.238)). For example, Bandilla & Bosnjak (2003)’s Web survey had 40.4% ofrespondents under 29, compared to 16.4% in the paper survey. Similar problems with internetsample diversity were reported by Best et al. (2001, p.138f).

Couper, Kapteyn, Schonlau & Winter (2007) reported experiences made from an internet surveyof persons 50 years old and older, where about 30% answered (via telephone) that they use the in-ternet, and of these 73% expressed willingness to participate in a Web survey. A subset was senta mailed invitation to participate in a survey and 78% completed the survey, which is relativelyhigh for Web survey response rates. The authors imply that noncoverage (which in this caseis a lack of access to the internet) appears to be of greater concern than nonresponse (which isin this case unwillingness to participate) for representation in internet surveys for this age group.

Demographic differences are not the only problems, since there may be other differences betweenWeb users and those who do not use the internet: “Even if internet users matched the targetpopulation on demographic characteristics such as sex, income, and education, there is still likelycoverage bias because they may well differ on other characteristics” (Lohr (2008, p.102)). Self-selection is often a mode used by Web surveys, which causes general problems with coverage15:“Coverage cannot be determined in samples that consist of volunteers, such as internet surveysin which a Web site invites visitors to click here to participate in our online survey” (Lohr (2008,p.102)).

Subsequently, three different sources of coverage error are given16, from which also related ad-ditional information can be retrieved:

13http://www.initiatived21.de/fileadmin/files/08_NOA/NONLINER2008.pdf14CATI = Computer Assisted Telephone Interviewing15This problem was also discussed for the sampling error16Which are taken from Weisberg (2005, p.217ff)

54

http://www.initiatived21.de/fileadmin/files/08_NOA/NONLINER2008.pdf


1. Ineligibles What is commonly meant with coverage error is undercoverage (or underrep-resentation), but there is overcoverage as well: “In some circumstances, individuals not in thetarget population are included in the sampling frame and are not screened out of the sample.These ineligibles can also systematically differ from the members of the target population” (Lohr(2008, p.100)). Another example would be the inclusion of businesses in a household survey.To avoid these situations (which bias the results) a good strategy is to ask screening questionsat the beginning of the interview, which is not always an easy task. Another option would beto purchase eligible sampling frames, which is currently very common, e.g. from internet panelinstitutions and can also have certain drawbacks as well as sources for bias17.

2. Clustering This frame mismatch comes up when groupings of units are listed as one inthe sampling frame, e.g. when household surveys are conducted, the unit being sampled isactually the household, even when the researcher really wants to interview only one personin the household18. Another possible situation is if a phone number is shared by multiplehouseholds; the chance of these households being selected diminishes when they are divided bythe number of households. If the number of households is known, weighting within the selectionprocess could be a strategy against this error.

3. Multiplicity (Duplicates) Multiplicity describes the situation when a case appears morethan once in a sampling frame, thus giving it a higher chance of selection than other cases.“It often occurs in list samples when the sampling frame is based on several different lists,as when using lists from multiple environmental organizations that have partially overlappingmembership, giving those people a higher chance of selection. It would also be a problem insampling from lists of e-mail addresses, since it is common for people to have multiple e-mailaddresses” (Weisberg (2005, p.221f)). When it is not possible to remove these duplicates inadvance, one strategy could be in weighting based on the reciprocal of the case’s multiplicity.

Reduction of Coverage Error in Web Surveys Again Weisberg (2005, p.213) mentions twogeneral strategies which attempt to avoid coverage error:

(1) Usage of probability- based telephone sampling methods, to recruit people with internetaccess for an internet panel. People who agree to participate are sent e-mail requests to fill outparticular surveys, with access controlled through passwords.

(2) “Offer free Web access to respondents who are originally contacted using telephone samplingprocedures to obtain a probability sample of the full population” (Weisberg (2005, p.213))19.

Of course systematic data organization becomes a major issue when dealing with samples. Dill-man (2007, p.198ff) stresses the importance of a good concept when creating and maintainingsample lists.

17For a discussion, see e.g. Sikkel & Hoogendoorn (2008, p.485ff)18The choice of whom to interview in a household is discussed in Weisberg (2005, p.245ff)19This strategy is e.g. applied with WebTV by http://www.knowledgenetworks.com

55

http://www.knowledgenetworks.com


9.1.1.4 Nonresponse Error at the Unit Level

A further problem found in Web Surveys is unit nonresponse: “Unit nonresponse occurs whensome people in the designated sample are not interviewed” (Weisberg (2005, p.159)). In otherwords the consequences are that (unit-) nonresponse error is “the result of nonresponse frompeople in the sample, when, if they had responded would have provided different answers to thesurvey question than those who did respond to the survey” (Dillman & Bowker (2001)).

Weisberg (2005, p.160) sees three types of (unit-) nonresponse, which can also be applied toWeb surveys:

(1) Noncontact refers to the situations in which designated respondents cannot be located, e.g.if an invitation letter cannot be delivered via e-mail. Lynn (2008, p.41) finds some reasons forunit nonresponse in Web surveys: “For invitation-only surveys, where a preselected sample ofpersons is sent (typically by e-mail), an invitation to complete the questionnaire, noncontact canbe considerable. This can be caused by incorrect out-of-date e-mail addresses, by the recipient’se-mail system judging the e-mail to be spam and therefore not delivering it or be the recipientjudging the e-mail to be spam and not opening it”. In the case of recruitment via popup on Webpages, noncontact is the case when popup-blockers are activated or Javascript turned off.

(2) Incapacity is when the designated respondent is incapable of being interviewed, due totechnological inability to deal with an internet survey (e.g. when certain necessary technologieslike Java are not installed or activated on the respondents’ computer).

(3) Noncooperation occurs when the designated respondent refuses to participate in the study.This form is likely to be non ignorable if people are not willing to participate in a survey becauseof its topic, such as if people with conservative social values were less willing to answer a surveyon sexual behavior.

Some theoretical approaches which try to explain the important points of survey participationare summarized in Weisberg (2005, p.165):

(1) Social exchange theory : “This theory assumes that people’s actions are motivated by the re-turn people expect to obtain. The exchange concept emphasizes that the survey must have valueto the respondents in order to interest them. Social exchange theory is based on establishingtrust that the rewards for participation will outweigh the costs”.

(2) Altruism, when one views survey participation as being helpful for a stranger.

(3) Opinion change is another psychological approach, which tries to convince the person thatthe survey topic is salient, relevant, and of interest to the respondent.

Weisberg (2005, p.172f) summarizes findings about demographic correlates of cooperation asfollows:

(1) Age: “Younger people are more willing to participate if they are successfully contacted,but they are less likely to be at home, which results in underrepresentation of young people

56


in some surveys”. This statement is true for telephone surveys, but for internet based surveys,respondents can choose the time when they fill out the questionnaire on their own. Additionally,younger people make up a higher percentage of those with internet access20. Similar observationswere made in the study by Bech & Christensen (2009), where a significantly lower response ratewas given in the Web-based survey compared to an equal postal survey. In this study, individualswere randomly allocated to receive either a postal questionnaire or a letter with a Web link toan online version of the same questionnaire.

(2) Gender : “surveys routinely under represent men, because men are less likely to be at homeand more likely to refuse”. This is again true for telephone surveys, but possibly not for internet-based surveys, because more men have internet access than women. But it is also true for internetstudies that males are more likely to refuse (Weisberg (2005, p.174f)). For (3) race and (4) ed-ucation, only less consistent results can currently be found.

There have been some efforts to decrease nonresponse, e.g. Thomas M. Archer (2007) sys-tematically examined 99 Web based surveys to find out which characteristics were significantlyassociated with increasing the response rate. Thirteen Web deployment characteristics and nineWeb based questionnaire survey characteristics were subjected with the following main outcomes:

(1) Increasing the total days a questionnaire is left open, with two reminders, may significantlyincrease response rates. It may be wise to launch in one week, remind in the next week, andthen send the final reminder in the third week.

(2) Potential respondents must be convinced of the potential benefit of accessing the question-naire.

(3) Do not be overly concerned about the length or detail of the questionnaire - getting people tothe Web site of the questionnaire is more important for increasing response rates. Additionally,the problem that no instructor or interviewer is available can in some cases cause break off,because the respondents do not know how to fill out the questionnaire or do not know how towork with the input controls.

Weisberg (2005, p.130) stresses the problem with (unit-) nonresponse data: “The bias cannot beestimated since the nonrespondent mean is not known. The bias problem is most serious whenpeople who would belong to one answer category are most likely not to answer that question, aswhen people opposing a candidate from a racial minority do not indicate their voting intentionsin a preelection survey”. The meta-analysis on 59 methodological studies in Groves & Peytcheva(2008) tried to estimate the magnitude of nonresponse bias.

Reduction of Nonresponse Error at the Unit Level Czaja & Blair (1996, p.182f) give somestrategies for the reduction of unit nonresponse (for paper questionnaires), which can also beapplied to Web questionnaires: “The options available to prevent unit nonresponse include de-signing interesting and nonburdensome questionnaire; using effective data collection procedures,such as advance letters and well-crafted introductions or cover letters”.

20As was already shown in section 9.1.1.3

57


Specific to Web surveys is the extent of unit nonresponse due to technical problems, like browserproblems or slow connection times (Weisberg (2005, p.189)). Furthermore, the lack of techni-cal skills (when respondents do not know how to deal with certain input controls necessary forparticipating in a survey) can be a barrier for participation. Because of this, it is importantto consider the simplicity of the survey and the use of standard technologies. Any additionaltechnological risk factors which could cause troubles within the respondent’s browser should beavoided.

In the case of Web surveys, technical problems21 could have an important influence on thisrate22. It is relatively difficult for Web survey software implementers to be responsive to allpossible combinations of different browsers, browser settings, operating systems and possiblyalso custom networking conditions as they are given in some companies or institutions. Thisrisk increases with the technical complexity of the survey (e.g. when using multimedia elementsor technologies not installed per default within each browser (like Flash)). Technical problemscan also be a relevant factor for item nonresponse.

After applying all strategies for reducing nonresponse error, there will in most cases still be somenonresponse error. According to Weisberg (2005, p.193ff), there are basically two statisticalsolutions to (unit-) nonresponse:

1. Weighting: “survey respondents are sometimes weighted in a different manner so thatthe obtained sample better resembles the population of interest”. Methods for doing soare:

(1) Probability of response weights, where “observations are weighted inversely to theirease of acquisition, with people who were easiest to interview being weighted less than thehardest”. The general assumption is that those who are hard to reach are similar to thosewho could not be reached.(2) Weighting class adjustment, which uses variables that were involved in the sample de-sign and calculates weights as the reciprocal of their response rates.(3) Poststratification adjustment differs from weighting class adjustment in that informa-tion about nonrespondents is not available, so these weights are based on populationsinstead.

2. Modelling nonresponse: this strategy attempts to model the nonresponse process.“These methods are controversial in that they make assumptions about the nonresponseand they are difficult to employ because they must be estimated separately for each vari-able of interest rather than simply deriving a single weight [...]. They require havingsome predictors of participation in the survey, including at least one predictor that affectsparticipation but not the dependent variable of interest”.

21Together with the technical equipment in general as also mentioned in Vehovar et al. (2002, p.235)22As was documented for the experiments accomplished in chapter 14

58


9.1.2 Response Accuracy Issues

9.1.2.1 Nonresponse Error at the Item Level

In contrast to unit nonresponse discussed in the previous section, here the respondent hasalready decided to participate, but does not finish filling out the surveys for several reasons.“Nonresponse error arises through the fact that not all people are willing or able to complete thesurvey” (Couper (2000)). According to Weisberg (2005, p.131ff), there are three different typesof item nonresponse:

1. Don’t know or no-opinion responses: in most cases, don’t know will simply mean thatthe person lacks the information needed to answer the question, but this is not always thecase: don’t know -answers are often a form of satisficing, it is an easy option to go on tothe next question. It is also possible that respondents did not understand the question.Derouvray & Couper (2002) explore alternative designs for such uncertain response, e.g.it was found out, that a reminder prompt decreases item-missing-data rates.

2. Item Refusal: refusals are relatively rare, even in questions about political voting andsexual behavior. Interestingly, the question that is most often refused is the income ques-tion. It is relatively difficult to find out if the respondent refused to answer in Web surveys(unless an alternative refuse to answer is given).

3. Not Ascertained: this can occur if skipping of questions is allowed. In Web surveys, socalled navigational errors can come up e.g. when the submit button is accidentally pressedtoo fast when the new page is already loaded (in paging mode). It can also happen whenthe structure of the questionnaire (and it’s branching) is not obvious. One can avoid theseproblems when single questions are clearly separated and using the paging mode when thequestionnaire contains many branches.

Biemer & Lyberg (2003, p.112ff) found concrete possible reasons for not answering all items:

• The respondent deliberately chose not to respond to a question because it was difficult toanswer or the question was sensitive.

• The questionnaire was complicated, and if the mode is self-administered, the respondentmight overlook the instructions or certain questions, or the respondent might simply exitthe response process because it was boring, frustrating, or time consuming.

• The questionnaire contains openended questions, which increases the risk for item nonre-sponse.

• The respondent (or the interviewer) makes a technical error so that the answer to a specificquestion should be deleted.

• The questionnaire is too long.

Reduction of Nonresponse Error at the Item Level Two strategies for minimizing item non-response are given in Czaja & Blair (1996, p.182f): “First, let the respondents know why thequestion is necessary. Some respondents will want to know, for example, why some of the demo-graphic questions (e.g. age, race or income) are needed. Often a simple response will suffice. [...]Second, remind respondents that all their answers are completely confidential and will never belinked to their name, address or telephone number”. For Web surveys in particular it is essential

59


to track as much paradata as possible in order to be able to decide at an early stage whetherthese problems have technical origins and intervene.

9.1.2.2 Dropout

A special form of item nonresponse is dropout: the respondent stops filling out the question-naire, which means that from a certain point on there are no more answers available for a certainunit. “Dropout is mainly a problem if it is systematic: some participants selectively drop out,for example, people who dislike the subject that an internet study is about. If the systematicdropout coincides with an experimental manipulation, then the study in many cases is severelycompromised“ (Reips (2002b, p.242)). The largest proportion of dropout occurs in the first partof the questionnaire (Ekman et al. (2007)). Because of this fact, questions at the beginningshould have low burden. The positive effect of this strategy was experimentally approved byEkman et al. (2007). They gave respondents two versions of the questionnaire, one with easyquestions at the beginning and one with hard questions at the beginning. The dropout rate forthose with easy questions at the beginning was lower.

The high-hurdle technique stands in contrast to this strategy: the idea behind this techniqueis to artificially create higher burden at the beginning of the survey to filter out less-motivatedrespondents. Göritz & Stieger (2008) carried out two experiments where participants had towait for the first page of the study to appear on the screen. It was expected that those whowould continue were more highly motivated, and so data with higher quality was produced anddropout was lowered. Against all expectations, the dropout rate and quality of data remainedindependent of the loading time. In this case, artificially delaying the loading of the first pagewas counterproductive. It was also found that questions about personal information should beplaced at the beginning of an internet study because this lowers dropout rates.

In the experiments conducted as part of this thesis, also the effects of the different input controlson dropout were also evaluated. As mentioned e.g. by Lynn (2008, p.41), breakoffs are typicallyhigher for Web surveys than with other survey modes, which can be reduced by good design.For concrete results, see chapter 18. In Ganassali (2008, p.28), it was hypothesized that a shortquestionnaire would, amongst other data quality improvements, produce less dropout and ahigher filling out rate.

9.1.2.3 Respondent’s Burden

When running the experiments as described in part III, the burden of a certain input controlused is an important influence particularly on item nonresponse and will be used in severalaspects of data analysis.

Here a definition of respondent burden as taken from Biemer & Lyberg (2003, p.107) is given:“one important correlate of nonresponse is the burden the respondent perceives in completing thequestionnaire and other survey tasks. It is widely accepted that if a survey request is perceived asinteresting and easy, the likelihood of obtaining cooperation will increase”. Examples for burdenare: length of the interview, the pressure the respondent might feel confronted with questions,

60


and also the number of survey requests the respondent receives within a certain time period. ForWeb surveys, additional points are also considered as burden, namely: time needed for filling outwith a certain input control (or in some cases loading times), the necessity to learn how certaininput controls are used. Additionally, Galesic (2006, p.315) state that: “The effect of burdenmight be cumulative whereby the burden experienced at each question is a function of bothspecific characteristics of that questions and burden experienced while answering the precedingquestions”. According to Funke & Reips (2008b), respondent’s burden can be measured with theactual and perceived response time23.

Incentives An increasingly popular approach for dealing with noncooperation is giving respon-dents incentives to participate. A good overview of the state of the art usage of incentives inWeb studies is provided by Göritz (2006b). Weisberg (2005, p.133) even recommends incentivesfor answering single questions to minimize don’t know answers. However, the question is whetherdata quality is really improved when following this strategy.

Galesic (2006) attempted to enlighten the respondents’ decisions for dropout by registering themomentary subjective experiences throughout the whole survey. Furthermore, the length of thequestionnaire was announced and the type of incentive was manipulated. Additionally, it wasanalyzed whether dropouts could be attributed to changes of interest and burden experiencedwhile answering survey questions. Characteristics of the respondent were also taken into con-sideration. Additionally the formal characteristics of questions (like: position; whether questionis open or closed; how many questions on one page) were also relevant for this analysis. An-nounced length, respondent’s age24 as well as block-level interest significantly affected the riskof dropout, while incentives, gender, education and work-related education had no significanteffect. The subjective experience of interest had a high influence on dropout: in this study, therespondents with above median interest had 40% lower risk of dropout then the respondentswith below median interest. Similarly, the respondents with above median burden experiencehad 20% higher risk of dropout than the respondents with below median experienced burden.

9.1.3 Measurement Error

“Measurement error simply stated is the deviation of the answers of respondents from their truevalues on the measure” (Couper (2000, p.12)). Dillman & Bowker (2001, p.2) mention poorquestion wording, poor interviewing, survey mode effects and/or some aspects of the respon-dent’s behavior as the main cause for this error. The bias caused by this error is relatively hardto measure because the true value is normally not known. Again, the risk of receiving highermeasurement errors is higher in self-administered surveys and even higher in Web surveys be-cause of additional influences like design and wrong use of input controls. Because of this it iseven more important for Web surveys to keep the survey instruments as simple as possible (evenif it comes at the expense of design).

The experiments accomplished in this thesis strongly focus on reducing this kind of error whereby

23Which is just one factor of many, see e.g. Hedlin et al. (2008, p.301): “Measurement of response burden tendsto focus on response time although response burden as perceived by respondents is not determined by timealone”

24Older respondents were more likely to complete the questionnaire

61


the focus was placed on scale questions. Therefore one source for this kind of error can be thesurvey instrument itself. Experimental design and results are described in part III.

9.1.4 Statistical Impact of Error

Statisticians distinguish between two general types of errors:

1. Systematic Error: “Systematic error is usually associated with a patterned error in themeasurement. This patterned error is also known as bias. An example of systematic biaswould be measuring the average age from a survey if older people were less willing to beinterviewed” (Weisberg (2005, p.19ff)). As the overview above shows, there are a lot ofpossible sources for bias, which directly influence the statistical results and measures (e.g.the mean).

2. Random Error: “Random error is error that is not systematic. For example, if people donot think hard enough about a question to answer it correctly, some may err on one sideand others may err in the opposite direction, but without systematic tendency in eitherdirection” (Weisberg (2005, p.19ff)). In contrast to the systematic error where the meanvalue is influenced by this error, the mean, random error has a mean of zero and thus doesnot affect the mean of a variable (but does increase the variance of a variable).

Although a general overview was given, the discussion of total survey error is not the main topichere25.

9.2 Reliability

When talking about measurement instruments, reliability is often mentioned as a quality crite-rion for this instrument. “The reliability of a measure has to do with the amount of random errorit contains. The reliability of a measure can be thought of as the extent to which repeated ap-plications would obtain the same result” (Weisberg (2005, p.19ff)). This definition focuses moreon the repeatability of measures and is reliability based on correlations between scale scores.

Internal consistency An important term when talking about reliability is internal consistency,which is a commonly used psychometric measure used to assess survey instruments and scales. “Itis applied not to single items but to groups of items that are thought to measure different aspectsof the same concept. Internal consistency is an indicator of how well the different items measurethe same issue” (Litwin (1995, p.21)). Or in other words: “A scale is internally consistent tothe extent that its items are highly intercorrelated. High interitem correlations suggest that theitems are all measuring the same thing” (DeVellis (1991, p.25)). This means that there has to bea strong link between the scale items and the latent variable26. Internal consistency is typicallyequated with Cronbach’s coefficient alpha, which is widely used as a measure of reliability27.

25For detailed information on Web survey error, consider Manfreda (2001), for a general discussion of the totalsurvey error, see Weisberg (2005), Biemer & Lyberg (2003), Vehovar et al. (2002) and Dillman & Bowker(2001)

26Which is “the underlying phenomenon or construct that a scale is intended to reflect” (DeVellis (1991, p.12))27See DeVellis (1991, p.26ff) for an explanation and the formula

62


Split-half One technique used to measure reliability is split-half 28. The dataset is split intotwo parts which are compared with each other. When using this method, it has to be assuredthat both datasets have the same properties; otherwise it is not advisable to simply split thedataset in the middle. Alternative approaches would be odd-even reliability (which comparesthe subset of odd-numbered items to the even-numbered items), balanced halves (importantitem characteristics are identified and used as splitting criteria), and random halves (items arerandomly allocated to one of the two subsets) (DeVellis (1991, p.34f)).

Test-retest Ideally, test-retest-correlations are calculated in the case that two identical re-peated measurements (one person has filled out the questionnaire twice) exist (Mummendey(2003, p.76)) and the same conditions prevailed, which is seldom the case. “It is measured byhaving the same set of respondents complete a survey at two different points in time to see howstable the responses are. It is a measure of how reproducible sets of responses are. Correlationcoefficients [...] are then calculated to compare the two sets of responses [...]. In general r valuesare considered good if they equal or exceed 0.70” (Litwin (1995, p.8)). Again, when applying thisapproach, it must be assured that between the two (or more) measurements, no environmentalconditions have changed (temporal stability).

9.3 Validity

“The validity of a measure is whether it measures the construct of interest” (Weisberg (2005,p.19ff)) or similarly Svennsson (2000, p.420): “Validity refers to the operational definitions of thevariable, and can be defined as the extent to which an instrument measures what it is intendedto measure”. Validity stands in coherence with the systematic bias: variables that are measuredwith systematic bias are not valid. A description of the validation process is given in Spector(1992, p.46) and focusses mainly on the validity of scales: “validation of a scale is like the testingof a theory, in that its appropriateness cannot be proven. Instead, evidence is collected to eithersupport or refuse validity. When a sufficient amount of data supporting validity is amassed, thescale is (tentatively) declared to be construct valid. Users will accept the theoretical interpreta-tion of what it represents”.

According to DeVellis (1991, p.43ff), there are three types of validity 29:

1. Content validity is the extent to which a specific set of items reflects a content domain.Content validity is the easiest to evaluate when the domain is well defined. The issueis more subtle when measuring attributes, such as beliefs, attitudes, or dispositions, be-cause it is difficult to determine exactly what the range of potential items is and when asample item is representative. The assessment of content validity (compared to face valid-ity) should be carried out by reviewers who have some knowledge of the subject matter.“The assessment of content validity typically involves an organized review of the survey’scontents to ensure that includes everything it should and does not include anything itshouldn’t” (Litwin (1995, p.35)).

28More information can be found e.g. in Mayntz et al. (1978, p.65)29Litwin (1995, p.35) adds a fourth type, face validity, which is based on a cursory review by untrained judges,

to see whether they think the items look ok to them

63


2. Criterion related validity means that an item or scale is required only to have anempirical association with the same criterion. In other words, it is the degree of effective-ness with which the scale predicts practical issues. Spector (1992, p.47f) describes howto achieve criterion-related validity (mainly for scales): “A criterion-related validity studybegins with the generation of hypotheses about relations between the construct of interestand other constructs. Often a scale is developed for the purpose of testing an existing,well-developed theory. In this case the scale can be validated against the hypotheses gen-erated by the theory”. If no theory and therefore no hypotheses exist, some theoreticalwork must be done in advance to generate hypotheses about the construct. Litwin (1995,p.37ff) distinguished between two main components: (1) concurrent validity, which re-quires that the survey instrument in question is judged against some other method that isacknowledged as a standard for assessing the same variable, and (2) predictive validity,which is the ability of a survey instrument to forecast future events, behavior, attitudesor outcomes. Predictive validity can be calculated as a correlation coefficient between theinitial test and the secondary outcome.

3. Construct validity is directly concerned with the theoretical relationship of a variable(e.g. a score on some scales) to other variables. It is the extent to which a measure behaveswith regard to established measures of other constructs. This means that when we have a(theoretical) variable, which is related to other constructs, then a scale which purports tomeasure that construct should bear a similar relationship to measures of those constructs.“It is a measure of how meaningful the scale or survey instrument is when in practical use”(Litwin (1995, p.43ff)).

The most frequently used measurement parameters are correlation coefficients to measure va-lidity as well as factor analysis, but the concrete application depends on the type of validitythat is being measured. Related to this topic is the sensitivity of a measurement instrument.“Sensitivity (or responsiveness) is the ability of a scale to discriminate among various systems,user populations or tasks. In order to be useful, an instrument needs to be sensitive, that itneeds to have the power to detect differences that are expected to exist” (van Schaik & Ling(2007, p.4)).

9.4 Mode Effects

There are many different modes of conducting surveys; self administered with paper and pencil -questionnaires, face-to-face, telephone, mail and internet. This section focuses on the compari-son of self-administered (in contrast to interviewer-administered) questionnaires and will mainlycompare paper and pencil with internet based versions. The effects or differences between thesetwo modes are subsequently discussed. It is difficult to study, whether the results of a surveywould vary had used a different mode, because there are multiple sources of differences betweenmodes. The biggest problems with online questionnaires are, compared to other (postal) waysof delivering, the low response rates and poor data quality (Healey et al. (2005)), which leadsto measurement and non-response errors. In addition, different computer skills (which may alsobe dependent on age and education) may cause a certain bias.

A further consideration in comparison to paper questionnaires is that computer logic is added

64


additionally to questionnaire logic. All information entered has to be committed30, in most casesvia pressing a (submit-) button. Therefore it becomes necessary to give some advice when run-ning Web surveys, but this important information is often forgotten when creating Web surveys.“Meshing the demands of questionnaire logic and computer logic creates a need for instructionsand assistance, which can easily be overlooked by the designer who takes for granted the respon-dent’s facility with computer and Web software” (Dillman (2007, p. 358f)).

There is one fundamental distinction between designing for paper and the internet: “the paperquestionnaire designer produces a questionnaire that gives the same visual appearance to thedesigner as to the respondent. However, in cases of both e-mail and Web surveys the intentionsof the designer for the creation, sending and receiving of the questionnaire are mediated throughthe hardware, software and user preferences. The final design, as seen by the creator, is sent tothe respondent’s computer, which displays the questionnaire. It is not only possible but likelythat the questionnaire seen by the respondent will not be exactly the same as that intended bythe creator, for several reasons. The respondent may have a computer with a different operatingsystem, different kind of Web browser, or even a different release of browser software. Respon-dents may also choose a different setting for the screen configuration, and may choose to viewthe questionnaire as a full or partial (tiled) screen. Of course it should be attempted to makethese questionnaire appear as similar as possible across technologies” (Dillman (2007, p.361)).

Additionally, browsers display some HTML controls (e.g. radio buttons) in their own special way,both similar and different default fonts are used. Questionnaires are displayed in a well-knownmanner to the respondent which is in some cases more desirable than an equal presentationfor all respondents. It is also possible to use custom CSS -settings, which would override thosedelivered with the questionnaire. This can modify the visual design dramatically. Of coursesome general disparities must be avoided: similar colors for text and background; differences inthe relative distances between scale categories and alternatives for closedended questions; textalignment must be equal across all browsers; All content must be initially visible (e.g. all scalecategories or alternatives for closedended questions31). But generally speaking, software devel-opers and Web designers these days usually know all the common pitfalls and difficulties whencreating cross-browser -solutions, so the troubles described above can be avoided quite easily.Nevertheless, pre testing the questionnaire (ideally with pre testers using very heterogeneoustechnology) becomes even more important.

With hardware, also different input devices are meant, e.g. scroll mice32, laptop keyboardscompared to usual keyboards, or in the case of visually handicapped people, braille reader de-vices, which do have special technical preconditions to be properly readable, such as nestedtables. Furthermore the use of advanced technologies in addition to HTML (like Javascript,Java, Flash,...) could cause difficulties, because some parts of the questionnaire could not bedisplayed correctly or not displayed at all.

Converting a paper questionnaire into a Web based questionnaire, to retain data comparability

30Technically this is not compulsory when using AJAX, but respondents expect this behavior of a questionnaireso it is recommended

31For methodological consequences, see the results of systematic experiments in chapter 532In Healey (2007) effects of using scroll mice when the questionnaire contains dropouts are affirmed

65


between internet and paper-based responses, is a challenging task for survey designers. Evenminor changes in the layout, color, fonts and other formatting of questions between these twomodes can convey different expectations about the kind of answer required. Because both modesare self-administered, the respondent takes all clues into concern, not only the textual ones. Thusdifferent interpretations of questions can be possible.

Additionally, usability issues must be considered when designing internet forms, as the task ofresponding to an internet questionnaire differs in significant ways from the tasks of respondingon paper (Potaka (2008, p.1)). Filling out should be made as easy and clear as possible, withlow burden for the respondent.

9.4.0.1 Mixed Mode

There are studies which make it possible to fill out the questionnaire in multiple modes (e.g.paper questionnaire and also an online version). When choosing such a survey design, all modeeffects discussed above must be taken into consideration. The advantage of such a survey designis that comparisons to determine the extent of such mode effects become possible. Weisberg(2005, p.293) found another advantage concerning coverage: “they are particularly useful in over-coming noncoverage problems, since the coverage involved in one mode is likely to be differentfrom the noncoverage in the other mode”. A drawback was that they of course raise the costsand administrative effort.

Shih & Fan (2007) report the results of a meta analysis on comparing response rates and modepreferences in Web mail mixed mode surveys with the overall result that mail surveys are pre-ferred over Web surveys, but with variation of mode preference across the studies. Because modedifferences can possibly change the context of the survey dramatically, the article by Smyth et al.(2008) on context effects may be helpful. Additionally, Heerwegh & Loosveldt (2008) investigatedthe differences in data quality between a face-to-face and a Web survey, where the hypothesisthat Web surveys produce lower data quality was supported. An interesting study dealing withthe effects of mode and question sensitivity is provided in Kreuter et al. (2008) who compareCATI, IVR and Web surveys with the result that there are differences between modes. Addi-tional discussion about mixed mode data collection can be found in de Leeuw (2005), de Leeuw(2008a), de Leeuw & Hox (2008), Duffy et al. (2005), Denscombe (2006), Fricker et al. (2005),McDonald & Adam (2003), Meckel et al. (2005), Voogt & Saris (2005), Dillman et al. (2008),Dillman & Christian (2005a) and Dillman & Christian (2005b).

66

10 Psychological Theories

In this chapter psychological processes relevant for filling out questions are documented. Thedifference in response behavior may have some psychological origins, possibly even more for Webbased questionnaires than other types of questionnaires as different and additional cognitiveprocesses are running. Dillman et al. (1998, p.6) identify differences in the logic of using acomputer and in filling out a questionnaire as one reason for different response behavior betweenpaper and pencil and online surveys. Hand and eye focus positions are set on different placeswhen using a computer.

10.1 The Response Process

Participants respond to interview administered surveys in four basic steps (Tourangeau et al.(2000, p.8ff)):

1. Comprehension: comprehension encompasses such processes as attending to the questionand accompanying instructions, assigning a meaning to the surface form of the question,and inferring the question’s point - that is identifying the information sought.It is nec-essary to remember that respondents of online surveys do more than simply reading thetexts. Although question wording plays a major role, visual design elements are also im-portant1. Biemer & Lyberg (2003, p.129) mention context effects which could influencethe comprehension of the questions: “A context effect occurs when the interpretation of aquestion is influenced by other information that appears on the questionnaire”.

2. (Information) Retrieval: the retrieval component involves recalling relevant informationfrom long-term memory.

3. Judgment: Couper, Tourangeau, Konrad & Crawford (2004, p.114) distinguish betweentwo modes of selection used by respondents for closedended questions:

a) Serial processing model: if respondents have a pre-existing answer to the questionand search serially through the list of response options until they find that answeror if respondents do not have a pre-existing answer but make a dichotomous judge-ment about the acceptability of each option, stopping as soon as they encounter anacceptable answer. A linear relation between response times and the position of theanswer selected is expected when following this model.

b) Deadline model: according to this model, respondents allocate a certain amount oftime to answering a question and select the best answer they have considered up tothe time when the deadline expires.

Satisficing and primacy effects play an important role in this process step. “[...] respondentsbegin checking answers and go down the list until they feel they have provided satisfactory

1See part I for concrete findings concerning such influences

67


answer. Although certain respondents will read and consider each answer, others will not.As respondents proceed down the list they may feel that enough has been done. Thus, theitems listed first are more likely to be checked” (Dillman (2007, p.63f)).

4. Response: Biemer & Lyberg (2003, p.125) additionally see a step before these, namelyEncoding and Record Formation, whereby knowledge is obtained, processed, and is eitherstored in memory, or a physical record is made. “For example, for a respondent to answeraccurately a question regarding the behavior of a household member in a survey, thebehavior must first be observed and committed to memory so that it can be recalledduring the interview”.

There are additional effects besides verbal language effects, namely numeric language(numbers in the queries and answer categories), graphical language (size, spacing, andlocation of information on the page), and symbolic language (e.g. arrows and answerboxes) (Christian & Dillman (2004)).

10.2 Visual Interpretive Heuristics

Tourangeau et al. (2004, p.370f) as well as Tourangeau et al. (2007, p.94f) distinguish betweenfive special visual interpretive heuristics that respondents follow when evaluating the visuallayout of survey questions. Each heuristic assigns a meaning to a spatial or visual cue. The fiveheuristics are2:

1. Middle means typical: this means that respondents will see the middle item in an array(or the middle option in a set of response options) as the most typical, which can causethe respondents to see the middle point as the anchor point (e.g. as the mean value forthe population) for their own judgements.

2. Left and top means first: this interpretive principle reflects the reading order of English(and most western languages), which is therefore a cultural phenomenon. Therefore thetopmost or leftmost position in a closedended list will be seen as an extreme point of twoendpoints (the same principle is valid for rightmost and bottommost).

3. Near means related: this heuristic means that respondents expect items that are phys-ically near each other on the screen to be related conceptually (e.g. items on same screenversus items on separate screens). This heuristic is related to the gestalt principle ofproximity, which is described in a section below.

4. Up means good: this heuristic is related to the second one and means that, with avertically oriented list, the top item or option will be seen as the most desirable.

5. Like means close: this means that items that are visually similar will be seen as closerconceptually. This heuristic is anticipated by the gestalt law of similarity described below.

10.3 Gestalt Principles

A few principles from gestalt psychology can also be applied to the visual layout of (Web-)questionnaires, like the principle of similarity (objects of the same size, brightness, color or

2Additional information can also be found e.g. in Schwarz et al. (2008)

68


shape are more easily seen together), principle of proximity (objects close to each other aregrouped together), and the principle of pragnanz.

10.3.1 Principle of Proximity

The Gestalt grouping principles suggest that placing instructions for respondents within thefoveal view as well as visually grouping them with the corresponding answer space using prox-imity should increase the number of respondents who comply with the instruction (Christianet al. (2007, p. 121)). This phenomenon is known as the principle of proximity and is describedin other words in Smyth et al. (2006b, p.8): “we tend to group things together based on the dis-tance of their spatial separation. In other words, we will see items that are close to one anotheras a group and items that are distant as separate. One way to achieve this in Web surveys isto use space”. For example, using greater space between questions than between the stem of aquery and the accompanying response options creates grouping based on proximity and clarifiesthe boundaries between questions (Smyth et al. (2006b, p.9)).

10.3.2 Principle of Similarity

The same principle is valid for visual similarity (e.g. when similar or equal colors are used):“when two options are similar in appearance, respondents will see them as conceptually closerthan when they are dissimilar in appearance” (Tourangeau et al. (2007, p.91)). This concept isalso part of underlying theory used by Tourangeau et al. (2000). Similarly in Smyth et al. (2006b,p.8): “respondents are more likely to mentally group images that appear alike”. Similarity canbe established through several means such as font, shape and color.

10.3.3 Principle of Pragnanz

The principle of pragnanz states that figures with simplicity, regularity, and symmetry are easierto perceive and remember (Dillman (2007), Smyth et al. (2006b, p.8) and Smyth et al. (2004,p.3)).

10.4 Types of Respondents in Web Surveys

To fix the terminology concerning the status of response, a short description of all types issubsequently given. Additionally, graphical representation of observable response patterns leadsto a differentiation between seven processing types3 is given in figure 10.14 :

3For a detailed description of these 7 types and their individual motivation, take a look at Bosnjak & Tuten(2001, p.6) or Bandilla & Bosnjak (2000, p.21).

4Taken from Bosnjak & Tuten (2001, p.5)

69


Figure 10.1: User patterns in Web surveys

1. Complete responders are those respondents who view all questions and answer allquestions.

2. Unit nonresponders are those individuals who do not participate in the survey. Thereare two possible variations to the unit nonresponder. Such an individual could be tech-nically hindered from participation, or he or she may purposefully withdraw after thewelcome screen is displayed.

3. Answering dropouts consist of individuals who provide answers to those questions dis-played, but quit prior to completing the survey.

4. Lurkers view all of the questions in the survey but do not enter any answers to thequestions. “Lurkers are potentially easy to persuade to complete an already started Webquestionnaire compared to a nonresponder” (Ekman et al. (2007, p.6)).

5. Lurking dropouts: represent a combination of points 3 and 4. Such a participant viewssome of the questions without answering, but also quits the survey prior to reaching theend.

6. Item nonresponders view the entire questionnaire, but only answer some of the ques-tions.

7. Item nonresponding dropouts: represent a mixture of points 3 and 6.

70

Part III

Experiments Conducted by theAuthor

71

11 Description of the Experiments

Experiments in three different surveys were conducted which offered respondents six differentinput controls for answering questions dealing with rating scales. All participants were random-ized to the different scale controls and the assigned control remained for the whole questionnaire.Although there would have been a lot of possibilities for varying each of the controls (e.g. la-belling scale points or modifying the width between scale points,...), only one version per controlexisted. All questions under control were organized as question batteries, ensuring that scalecontrols were used for each sub question. All experimental questions had two anchors (two ex-tremes) on each side and the respondent had to give a concrete vote to position between thesetwo extremes. To get ideal support from the survey tool, custom-made software was developedby the author, which satisfies all these needs with less effort1.

11.1 General Experimental Design

The general composition of the experiments is as follows: if an interviewee opens the Webpage containing the questionnaire, the questions are presented in a certain visual design whichwas randomly selected. The probabilities for each control type were fixed before starting theexperiment. This design is called split-ballot-experiment2, which is generally used for findingout instrument effects. After assigning a control to each of the respondents, the groups canbe compared and analyzed with statistical methods. Respondents were not informed about theexperiments conducted behind the ostensible questionnaire to avoid a Hawthorne-effect3.

Paging Although the software can display multiple questions on one page, the paging (onequestion per page) mode was used, because of the increase in observability for the experiments.Thus it became easier to determine the question where dropout occurred as well as time neededto fill out one question. A client sided time tracking instrument was used to measure the timefrom when the page was initially loaded until data was submitted4 to avoid bias caused by datatransfer, time needed for loading and processing the answers on the server and to remove anypossible effect of differential download speeds5. Reips (2002c, p.248) has similar arguments,especially in regard to dropout reports: “If all or most of the items in a study are placed on asingle Web page, then meaningful dropout rates can not be reported”. For further informationon paging versus scrolling (all pages on one screen) design and the effects of this decision, seethe experiment conducted by Peytchev et al. (2006). The drawback of this design was higher

1For detailed information about usage and design of the software, see part IV2For a more detailed description and discussion about this method see Noelle-Neumann & Petersen (2000, p.192)3For an explanation see Diekmann (1999, p.299)4Which means in this case pressing the next button to go to the next question5Which was also mentioned by Heerwegh (2003) extended from Heerwegh (2002)

72


completion times which in turn leads to higher dropout rates. For more information on resultsof already accomplished experiments dealing with paging versus scrolling, see section 5.1.

Question Batteries All questions under experiment were organized as batteries, which meansthat one question consisted of multiple sub questions. This made it easier for the respondentsto answer more quickly and made it obvious to the respondent that these questions logicallybelonged together. Putting each subquestion on a single page would have resulted in a higherburden due to loading times and time needed for initial orientation when the page was initiallyloaded. There are certain drawbacks of this design, such as context effects, difficulties whenattempting to track the exact time needed for a sub question, and in the case of dropoutdifficulties arise when determining the last successfully filled out subquestion6. An example fora context effect is that respondents tend to take the first answer of a question battery as somekind of reference, which influences the answering of the remaining questions.

Mandatory Answering Answering for all questions was mandatory, which meant that if thenext button was pressed before sufficiently answering the question on the page, feedback wasgiven directly in form of a soft prompt to explain the lack of response within the client’s browserand data was not delivered to the server. More thoughts on the possibilities of real time valida-tions of required responses can be found in Peytchev & Crawford (2005, p.239). Validation wasgiven on client side via Javascript to display the feedback to the respondent immediately. WhenJavascript was disabled, the server was checked and the previous question was displayed againif an answer had not been submitted successfully. This approach was met with criticism7. Itmay leads to higher dropout particularly when non-standard input controls are used. However,in all 3 surveys it was desired by the sponsors. No default selection or slider-positioning wasprovided, as recommended by Reips (2002b, p.246). Additionally, no midpoints were marked.

Instructions To ease the filling out process, a simple instruction text on the use of the controlscurrently assigned to the respondent was displayed on top of each question battery. The positiondid not vary as this was not the focus of these experiments8.

Technical Preconditions For the assignment and proper use of the different input controls, itwas important to determine which technical preconditions had to be fulfilled by the respondent’sbrowser. This should be illustrated with the following example: if Javascript was disabled,which was the case for approximately 2% of the respondents9, some of the controls could notwork properly. Thus, it was previously detected whether Javascript was enabled within theclient’s browser, and if not, a control which was working without this technology was assignedat random. This approach should minimize technical side effects.

Schwarz & Reips (2001, p.78) discuss possible problems when Javascript is disabled: “Missing orturned off Javascript compatibility can have disastrous consequences for online research projects.This is particularly abundant whenever Javascript functions interact with other factors that have

6Similar arguments can be found in Reips (2002c) and Reips (2002b)7E.g. Reips (2002c, p.248) argues that “it is potentially biasing to use scripts that do not allow participants toleave any items unanswered”

8Christian & Dillman (2004, p. 61) did however research this in their experiments9See chapter 14

73


an influence on motivation for participation or dropout. Participation may be low, dropout ratesmay be high, and participants’ behavior may be systematically biased”.

VAS As special controls, Visual Analogue Scales (VAS) were implemented and compared tothe other controls. For definitions and properties of VAS, see chapter 2, for screenshots anddescriptions of concrete behavior, see the corresponding section below, which describes all thecontrols. VAS are relatively new in online survey research, nevertheless a few experimentshave already been accomplished, with somewhat contradictory findings (see chapter 4 for moreinformation on this).

Feedback To retrieve direct feedback from the respondents, a few questions were placed at theend of the questionnaire, such as (translated from German): (1) “Is the scale fine grained enoughto express the own opinion exactly?”; (2) “Are the controls too complicated in general?”; (3)“Does it take too long to understand the use of the instruments?”. These individual respondentimpressions were very important, because dropout could partially be explained through thefeedback on the different controls. Unfortunately, those who quit the survey did not reach thisquestions and their feedback would have been even more interesting.

Considerations regarding Design When thinking about all the different possible settings on arespondent’s local PC, it becomes obvious that only providing different designs is not enough. Itis also important that the designs are compared. It is not manageable to make the questions lookthe same for all configurations and browser settings due to e.g. different symbols and lookalikeused for input controls within the different browsers. There are a few settings which are notpossible to control, which would significantly change the appearance of the questionnaire. As anexample, the Web page designer cannot influence the consequences of using custom style sheets(CSS ) which can be set for browsers like Firefox. Regarding cell size of and spacing between thescale points, Dillman & Bowker (2001) report possible problems with distances between pointswhich change as a result of different screen resolution or switching to full screen mode. Questionsin the tourism and snowboard questionnaire had a yellow background with black text color; thewebpage had a white background with black text color, which was desired by the sponsors ofthe surveys.

Standards The 16 standards for Web experimenting as defined by Reips (2002c) as well asthe points mentioned in Reips (2002c), Reips (2002b), Andrews et al. (2003), Crawford et al.(2005), Kaczmirek & Schulze (2005), Kaczmirek (2005), Lumsden & Morgan (2005) and in a veryearly document byDillman et al. (1998) were taken into consideration and were implemented, ifapplicable on the concrete experiments.

Navigation No button enabling respondents to quit the survey was provided for the simplereason that respondents were always able to quit the survey by simply closing their browser (orthe browser’s tab). Responses were stored for each screen, so that the use of such a button wasnot necessary. After answering each page, respondents had to explicitly take an action to proceed(namely pressing the next button) to reach the next page. The next-button was displayed inthe bottom left corner of the screen, which was recommended by Crawford et al. (2005, p.56).

74


It was not possible to return to the previous question since the correction of previously givenanswers would not have been favourable for the experiments.

Technical Infrastructure For all surveys, the online survey software QSYS (in a very earlyversion) was used, which was developed as part of this thesis10. For the database, an Oracleversion 10g was used, which was hosted by the Central Information Technology Service of Inns-bruck University, who greatly supported the whole dissertation project. To avoid data loss, acomplete backup was created every day. The application itself runs on a Web Application Serverbelonging to the IT-Center of Innsbruck University, which provided system stability. To encryptinformation transported via internet to provide confidence, https was used as a protocol.

11.2 Different Input Controls

To get an impression of the different looks of the different controls, see the screenshots belowtogether with a short description of their use. Only one example is given for each question type,but all controls look the same in all other questions. For each control, a short name is given inbrackets, which will be used for simplicity within the evaluation chapter. For all controls, 10scale points were used, except for slider-VAS (200 scale points) and click-VAS (20 scale points).When the selectable items stand isolated with spaces between them (as is the case for the radioand button control, focus was placed on using equal spaces between these items to avoid negativeside effects as e.g. reported in Tourangeau et al. (2004, p.379f)).

11.2.1 Radio Button Scale (radio)

The most commonly used control elements for such question types are radio buttons. Radiobuttons appear as a row of small circles; each circle corresponds with a response option. Recentstudies have already figured out the advantages of these types of control elements11. The advan-tages of this type are easy usability (just one single click is necessary to rate) and familiarity.Radio buttons are frequently used on Web pages in general and they resemble the equivalentpaper-based input fields (in most cases). They are also recommended by standards for Websurveys e.g. in Crawford et al. (2005, p.55): “For single response questions, radio buttons shouldbe used for respondent input”. Funke & Reips (2008a) even categorize this type as Radio ButtonScales (RBS ).

10See part IV for a detailed description of the software and its capabilities11For detailed information, see chapter 4

75


Figure 11.1: Screenshot of a sample of a radio question

A possible (unproved) drawback could be the small size of the button and the necessity to movethe mouse exactly on to the small circle, which could lead to higher burden (with the possibleconsequence of higher dropout rates). It was attempted to find cross-browser solutions for allcontrols, looks that they would all look the same in all browsers as far as possible. In caseof radio buttons this is impossible because each browser uses different visualizations of thesecontrol elements, which can also lead to side effects12.

11.2.2 Empty Button Scale (button)

To avoid the possible drawback of the size of the radio buttons, bigger buttons were used for thiscontrol, which behave in the same way radio buttons do (with the difference that these buttonsturn red when selected and gray when not selected), so the clickable action area for each responseoption is much larger when using this control compared to radio buttons. Technically, these arenormal buttons used within HTML-forms, but with adapted styling using CSS. Javascript wasused to change the color of the buttons when clicking on them. Again, one simple click wassufficient to answer a (sub-) question with this kind of control.

12E.g. Welker et al. (2005, p.21) mentions a shadow effect for radio buttons in some browsers, which can havean influence on response behavior

76


Figure 11.2: Screenshot of a sample of a button question

11.2.3 Click-VAS (click-VAS)

When examining radio- and button-controls, there are still spaces between the buttons. Itwould possibly give the respondent the feeling that they are rating on a continuous scale, whenthere are no spaces on the scale, which often occurs with this control type. Technically, againJavascript was used to simply exchange pictures when clicking on them. One picture was usedfor each scale item. Again, red was used as color to accentuate the selected scale item. Thiscontrol possibly best fits the definition of a VAS given by Funke & Reips (2007a). However, itis important to mention that one scale point had a width of more than one pixel which was arequirement implied by Funke & Reips (2007a). No tick marks or labels were present on thescale to ensure a real continuum13.

13For a definition (a list of properties) of VAS as they are utilized in these experiments, take a look at section 2

77


Figure 11.3: Screenshot of a sample of a click-VAS question

This approach had one technical disadvantage: some browsers have image drag-and-drop func-tionality, used to easily copy pictures from a webpage to a local directory on the computer.Thus, when some respondents accidentally did not click on the items but tried to slide over thecontrol (with the left mouse button pressed), this feature was activated and the mouse pointerchanged to the drag-and-drop-mode which could have confused the respondents. A better so-lution would have been to use simple div -boxes as scale items, which do change backgroundcolor when clicking on them. But nevertheless, this behavior in all likelihood did not have anyinfluence on data quality; possibly respondents were only confused when this occurred for thefirst time. Again, one simple click was sufficient to make the selection.

11.2.4 Slider-VAS (slider-VAS)

This control consists of a VAS with 200 scale points behind which each point was selectable. Theuser had to position a slider between the two anchor points. When starting to answer a questionwith this slider control, the initial position of the slider was in the middle of the scale. To makeit possible to distinguish between skipping a question and placing the slider in the middle of thescale deliberately, the slider had to be moved. There were other approaches, e.g. Couper et al.(2006) used a slider which was not automatically present on the scale for the experiments. It wasnecessary to initially click on the horizontal bar to activate (and see the slider). This had severaladvantages (e.g. no influence on response behavior by the starting position), but it was possiblyharder for the respondents to understand how to use this control. When showing the slider, itbecame clearer how to handle the control. Unfortunately, the initial positioning had two sideeffects: (1) because respondents had to move initially, some moved away from the midpoint(mostly in the right direction) although the desired position to be selected would have been themidpoint, that’s why there is a small gap at the midpoint for this control14. (2) A midpoint

14Take a look at figure 20.1 to see the effect

78


could be found by the respondents when taking a look at the slider positions underneath. Thisis problematic, because a condition for all 6 controls was not that no position on the slider wasmarked.

Figure 11.4: Screenshot of a sample of a slider-VAS question

Technically, a simple (configurable) Java Applet was used to implement the slider. To commu-nicate with this Applet, a Javascript callback function was passed to the Applet at initializationtime, which noted each time the slider position was modified. This value was set to a hiddenform field within the page via Javascript, which was delivered to the server when pressing thenext-button. The reason, why the Applet did not directly communicate with the server wasbetter integrability into the existing structures of the survey software.

11.2.5 Text Input field (text)

This control uses simple text form input fields (customized with CSS ) where a number has tobe entered to rate. The line with the scale positions is displayed above the input field. This isthe only control, where additionally the keyboard has to be used to give the answer. To avoidinvalid input of information, a client-sided integrity check was performed before data was sentto the server. If anything else than a number between 1 and 10 was entered, a soft prompt onthe top of the questionnaire was displayed in red and data submission to the server was blockeduntil only valid numbers were entered. This was necessary to keep data clean, but this couldhave also increased respondent’s burden.

79


Figure 11.5: Screenshot of a sample of a text question

A negative effect could have been caused by the autocomplete function (according to previouslyentered values) offered by most of the common browsers (e.g. Firefox ) when using text inputfields with the same names, which could have led to same answers for sub questions.

11.2.6 Dropdpown Menu (dropdown)

This control is similar to the text control, but here dropdown boxes were used to select theappropriate position on the scale instead of a text input field. As in all other controls, no defaultselection was given15 but simply an empty item shown. This was for example mentioned inWeisberg (2005, p.123) as an important point to avoid, since this alternative is then inadvertentlyrecorded as giving that default answer. To see all possible alternatives (which means values 1-10), no scrolling was necessary, all were initially visible after clicking on the control. In contrastto the other controls, an initial click was necessary to see all alternatives.

15This is especially recommended for dropdown boxes which is e.g. mentioned by Crawford et al. (2005, p.55)

80


Figure 11.6: Screenshot of a sample of a dropdown question

Two steps were necessary for answering: clicking on the dropdown box and selecting the desiredentry. As found in previous experiments, this control seems to be the most problematic one.E.g. one principle in Dillman (2007, p.392) is to use dropdown boxes sparingly. Healey (2007)found out that measurement errors can occur with this control in combination with scroll mice.Dillman & Bowker (2001) observed that, “Not knowing what to do with a dropdown menu as areason for respondent frustration and so a possible source for higher dropout rates”. Similarly,Crawford et al. (2005, p.55) also recommend that list boxes should in general only be used whenthe list of responses exceeds twenty.

11.2.7 Differences and Similarities

All styles have different properties, which can influence results. It is important to evaluate allcontrols on this level, because the likelihood of satisficing increases with task difficulty and effortto answer. In table 11.1, all properties of the controls were listed at a glance.

Feedback in this case means if there is any numerical feedback given, so that the respondent canprecisely enter a number in opposition to the controls, where the choice is placed without anynumerical feedback as informative basis provided16. Also Cook et al. (2001, p.700) argue whyno feedback concerning interval count should be given on graphic scales: “When a scale usesnumerous score intervals, participants are told how many scale points there are, and they notonly can but are expected to accommodate these intervals within their conscious thinking”.

16For more on numerical feedback in scale questions, see Schwarz et al. (1991)

81


control feedback continuous input devicesradio no no mouse

button no no mouseclick-VAS no yes mouseslider-VAS no yes mouse

text yes no mouse, keyboarddropdown yes no mouse

Table 11.1: Different properties of all input controls

11.2.8 Technical Preconditions

Because some of the controls only worked when a certain technology was enabled within theclient’s browser, these prerequisites were checked before starting the questionnaire and accordingto this information, the control was assigned (e.g. when Javascript or Java was disabled, onlycontrols were assigned, which did not depend on these technologies, thus controls were assignedrandomly until one with all technical preconditions was assigned). Table 11.2 shows the technicalpreconditions for all controls:

control interaction mechanism precondition(s)radio one click Javascript

button one click Javascriptclick-VAS one click Javascriptslider-VAS click to activate and slide Java, Javascript

text one click or tab key pressed; enter text nonedropdown two clicks (initiating and selecting) mouse-move none

Table 11.2: Technical preconditions and interaction mechanisms for all input controls

11.3 Specific Experiments

Three independent studies were accomplished. An overview of each survey’s experimental vari-ables is given in tables 11.3, 11.4 and 11.5 (the difference between the two question types issimply that an interval matrix additionally has a subquestion (or statement) on the left side ofeach rating scale). The wording for all three surveys was in German, so in later chapters, whenconcrete questions are referenced, a translation was provided.

Here, a short overview of the three studies is given (the short names will be used when describingthe results of the experiments):

1. tourism: the most important study for this thesis was a research project conducted atSalzburg University dealing with students’ attitudes towards alternate tourism. As recruitment,all students of Salzburg University received an invitation letter. Additionally, a link was placed

82


on the University Innsbruck webpage to also reach Students from Innsbruck17. This is the onlysurvey which gave some sort of incentive: at the end of the questionnaire, an individual trav-eller’s type was implied from the answers given.

10 questions batteries with an overall 72 sub questions were available. In the following, thepositions and types of the questions were listed, together with the number of sub questions foreach battery:

position question type no. of sub questions6 interval matrix 79 interval matrix 1212 interval matrix 613 interval matrix 615 semantic differential 1216 interval matrix 1117 interval matrix 636 semantic differential 442 semantic differential 443 semantic differential 3

Table 11.3: Experimental variables - tourism

2. webpage: after the re-launch of the Innsbruck University Web page, a survey to receivefeedback about the new design and functionality was conducted. All students, employees andalumni of Innsbruck University were invited to participate via e-mail. The survey consisted of12 scale questions (which means a total of 32 sub questions). This survey was only availableonline for a short period of time. Subsequently an overview of the positions and number ofsubquestion of these questions is given:

position question type no. of sub questions2 interval matrix 13 interval matrix 44 interval matrix 15 interval matrix 56 semantic differential 38 interval matrix 110 interval matrix 112 interval matrix 814 interval matrix 116 interval matrix 324 interval matrix 136 semantic differential 3

Table 11.4: Experimental variables - webpage

17See the concrete distributions in the demographic description section 15

83


3. snowboard: the third survey is a diploma thesis written at Innsbruck University. Thetopic was Market research within the field of snowboarding. A different recruitment strategywas chosen here, namely several postings were placed in relevant snowboard forums. Again, anoverview of all questions under control with the number of sub questions is given in the tablebelow (15 questions with a total of 79 sub questions):

position question type no. of sub questions10 semantic differential 1911 semantic differential 1914 interval matrix 821 semantic differential 622 semantic differential 223 semantic differential 424 semantic differential 425 semantic differential 226 semantic differential 327 semantic differential 328 semantic differential 229 semantic differential 231 semantic differential 135 semantic differential 142 semantic differential 3

Table 11.5: Experimental variables - snowboard

Within this survey, an additional experiment was run with different styles for the ranking ofclosedended alternatives, whereby six different controls for ranking alternatives were imple-mented. Because the number of respondents was too low to find differences between controls,this experiment will be repeated in a survey with more participants and the results will not bereported within this thesis.

Invitation letter In the case of the first two surveys, an invitation letter was sent which con-tained a PIN -number enabling all recipients to enter the survey and as an access control toprevent uninvited respondents from taking part in the survey. The PIN had to be entered ona login page. There was an idea to integrate the PIN directly as a parameter into the url, sono manual entering would have been necessary and the respondent would directly come to thestart page of the questionnaire, but this solution had the drawback of increasing the length ofthe URL, which could, when line breaks become necessary within the invitation letter, causeproblems with some e-mail clients. Heerwegh & Loosveldt (2002c) found out in an experiment,that manual login procedures did not decrease response rates, and as a positive effect, increasedthe overall degree of data quality (lower dropout rates) which led to the decision to use themanual login procedure. The PIN login mode does not prevent respondents from filling out thequestionnaire multiple times, but if each respondent would have his/her own PIN, anonymitycould not be assured anymore, which was very important for the first two surveys. The lettersimply contained a short description, no personal salutation was used. HTML was used insteadof plain text to integrate the link on the questionnaire (respondents just had to click on the linkto go to the login page of the questionnaire).

84

12 Research Questions

The main hypotheses will strongly focus on VAS. The main research questions discussed in theseexperiments are as follows:

Dropout Concerning completion rate and breakoffs, the hypothesis is that the higher the com-plexity of an input control, the higher the dropout will be. This was e.g. observed by Couperet al. (2006). Complexity in this case refers to the number of actions necessary to answer thequestion and also the completion time needed for answering.

Another influence factor is how well-known the input controls are for the respondents. Con-cretely, this would mean that radio and dropdown would have lower dropout than the othercontrols, because these are standard HTML input controls.

Concerning VAS, contradictory findings were reported in the summary in section 4.1. It seemsthat results strongly depend on the type of VAS used in the experiments. Experiments withVAS which are more similar to click-VAS by trend resulted in lower dropout rates. In contrast,the experiments where VAS similar to slider-VAS were used, higher dropout rates were reportedcompared to the other scales.

Response Distributions Three main important questions should be treated in these experi-ments:

(1) Is any category on the scale preferred in one control group compared to the others? Inparticular, it should be checked if there are any differences in the selection of extreme values(minimum, maximum) and the midpoint when comparing the different scales.

(2) Are there any general differences when comparing the means of the scales? Experimentsdealing with these effects are summarized in section 4.2, where a slight tendency for achievinglower values for VAS can be observed.

(3) Does the numeric feedback have any influence on the response behavior? Some numericfeedback is given for text and dropdown, because numbers are shown on the drawn scale. Thequestion is if the distribution of these two are different compared to the other input controltypes.

Response Time / Completion Time Can any differences be found between the different con-trols regarding time needed for filling out the questions and dropout? The general hypothesisis that on average scales which involve fewer steps to arrive at an answer also led to shorterprocessing times. The most numbers of steps necessary for answering are dropdown and text,which means it should be shown that the use of one of these two input controls result in higher

85

12 Research Questions

response times compared to the other controls. This was also reported in similar experiments.Additionally to the general trend to avoid dropdown boxes, e.g. Healey (2007) conducted thatdropdown menus in general led to longer response times. It should be shown if this effect canbe reproduced.

The focus is particularly on response times when using VAS compared to the other controls.Section 4.8 showed contradictory findings in regard to completion times, which results fromdifferent definitions of VAS in the experiments conducted. An interesting question is if thetwo VAS, which are technically, visually, and sensationally relatively different, behave similarlyregarding their response times..

Learning Effect Another hypothesis is that, when custom controls1 are used, it takes very longto finish the question for the first time, but relative completion time decreases from question toquestion due to a learning effect.

Categorization/Transformation click-VAS and slider-VAS both use different numbers as scalemarks compared to the 4 other controls. To enable comparability, these values have to be trans-formed. Funke & Reips (2006)2 compared two possible ways of transformation: linear transfor-mation (equal intervals form one category) and transformation with reduced extremes (intervalsat the extreme points are smaller than the other intervals). They found that transformationwith reduced extremes led to higher correspondence with the scale marks used in radio buttons.This experiment will be carried out again to see whether these findings could be reproduced.

Feedback Questions The feedback questions3 asked at the end of the questionnaire should givean impression of how the interviewee feels about the input control assigned concerning usability,attractiveness and sufficient number of scale points. Two experiments reported by Walston et al.(2006, p.285 ff) and van Schaik & Ling (2007, p.18f), which both had similar questions foundthat the use of VAS were not rated highly by respondents. Concerning interestingness, it isexpected that common HTML controls get a score nearer to the boring-pole, but these controlsare also expected to gain higher results in regard to the ease of usage.

Usage of Java Technology It will be evaluated if Java controls are generally suitable for on-line questionnaires, despite the drawback of long initial loading times. Concerning the Javaversion of the VAS (slider-VAS ), previous studies such as Couper et al. (2006) reported tech-nical problems using this type of slider. It will be checked whether these technical difficultiescan also be observed and minimized as necessary. Experiments carried out for this thesis hadvery detailed paradata tracking in order to examined if technical problems occurred more oftenwhen using a certain operating system or Web browser. This point is strongly related to dropout.

In an experiment conducted by Couper et al. (2006) it was necessary to activate the VAS viaclicking. A different strategy was chosen for slider-VAS within these experiments, and this thesisaims at checking whether displaying the slider without the necessity of clicking had any effect.

1Which means controls not written in plain HTML2And also in Funke (2004) and Funke (2003)3See section 16 for a detailed description of the feedback questions.

86

13 Overall Response

In table 13.1, information about the overall response for all three surveys is provided, wherebythe number of respondents for each experiment is shown together with the number and pro-portion of those, who at least filled out one question. Generally, the portion of respondentswho completely filled out the questionnaire is relatively low (53.11% on average). The numberof overall participants in the snowboard survey is also low (402 participants). That’s why thissurvey will partially be excluded from analysis.

tourism webpage snowboardParticipation in general 1262 1538 402Lurkers (11.73%) 148 (4.81%) 74 (26.62%) 107At least one question (88.27%) 1114 (95.19%) 1464 (73.38%) 295completed (60.30%) 761 (55.27%) 850 (43.78%) 176

Table 13.1: Overall response for all 3 surveys

13.1 Overall Input Control Distribution

Due to technical problems when starting the webpage survey, input controls are not equallydistributed. See tables 13.2, 13.3 and 13.4 for the input control distribution for all 3 surveysincluding those who finished the survey and those who did not. Another technical problemmay have caused an inappropriately high number of break offs for the slider-VAS within thetourism survey. Unfortunately, it could not be determined if those who quit the questionnairehad technical difficulties or these respondents simply did not know how to use the control.

overall portion not completed completedradio 15.89% 40 (22.60%) 137button 17.41% 50 (25.77%) 144click-VAS 14.36% 31 (19.38%) 129slider-VAS 21.01% 127 (54.27%) 107text 14.72% 55 (33.54%) 109dropdown 16.61% 50 (27.03%) 135

Table 13.2: Input control distribution for the tourism survey

87

13 Overall Response


Table 13.3: Input control distribution for the webpage survey


Table 13.4: Input control distribution for the snowboard survey

88

14 Paradata and Additional Information

When conducting online surveys, a lot of additional information, apart from the answers givento the questions, can be collected about the filling out process itself. This information is calledparadata. First some thoughts on which kind of information can be retrieved for further anal-ysis from questioning. Even if the user takes part quasi anonymously, although some technicalparameters can be retrieved automatically. All of these parameters were tracked when runningthe concrete experiments; some of them could only be read when Javascript was enabled withinthe client’s browser. Heerwegh (2003, p.361) notes one drawback of client sided paradata col-lection: “One drawback of client-sided paradata is that they will be lost if the respondent doesnot submit the Web page”. This problem was solved by employing AJAX -technology: when thepage containing the question was loaded, client sided collected paradata was sent directly to theserver and not together with the other form data containing the answers of the questions.

Subsequently an overview of collected paradata is given:

• The kind of Web browser used: this information was tracked on the server side by readingout information from the HTTP-header. Numerous different browsers are available; hereonly the most commonly used ones were classified (IE6, IE7, Mozilla, Safari and Opera)

• The Web browser settings currently in use: e.g., it was checked whether Javascript, Javaand Cookies were enabled.

• The operating system used: again, this information was extracted out of the HTTP-headeron the server side.

• The screen resolution used: the resolution can be implied from the window size of thebrowser if maximized, which is read via Javascript within the client’s browser.

• (geographical) Pinpointing using the user’s IP-address. From this it is possible to derivewhether the respondent filled out the questionnaire at home, in a public venue (like librariesor internet cafe) or at work.

• Duration: how long did respondents take to fill out the questions? To exactly measure thetime between the respondent receiving a questions and a response being sent, client sidedtime tracking was implemented with Javascript1.

• Referrer : the referrer holds the page where respondents come from. This information isonly meaningful if links were placed on different locations within the Web2.

Theoretically, much more information could be tracked, such as mouse moves over the Web page,previously written and deleted text in text input fields, previously selected radio buttons,... ,but this information was not primarily of interest when running the experiments described.

1The software used for these experiments can be downloaded fromhttp://survey4all.org. For similar approachesconcerning time tracking, see Heerwegh (2002), Heerwegh (2003) or Heerwegh (2004a)

2As was the case for the snowboard survey

89


14.1 Overall Operating System Distribution

One of the most time-intensive-task when generating Web pages in general is cross browsercompatibility. To make sure that no side effects concerning browser and operating system usageoccur, these variables have also been taken into consideration. This should indicate whetherthere are more (particularly initial) break offs when a certain browser or operating system wasused. Table 14.1 shows the distribution of these variables for all 3 surveys. As expected, the vastmajority of the respondents used Windows as operating system, followed by a Mac operatingsystem (like OS X ). Only a few used Linux.

tourism webpage snowboardnot completed completed not completed completed not completed completed

Windows 337 728 564 799 107 163Mac 15 26 29 24 11 10Linux 1 7 21 27 1 3

Table 14.1: Use of operating systems for all three surveys

Concerning the distribution of the operating systems, no noticeable problems could be observed.More detailed research on this topic would be possible (e.g. which Windows versions or Linuxdistributions were used), but this is not of further interest in this work.

14.2 Overall Browser Agents Distribution

An overview of the most common browser agents used by the respondents in the three surveysis given in table 14.2:


IE 6 124 302 190 310 25 33IE 7 83 156 115 176 28 31Firefox 108 279 216 280 51 101Opera 21 2 30 3 9 0Safari 8 17 18 14 4 8others 9 5 45 67 2 3

Table 14.2: Use of browser agents for all three surveys

As can be seen for all three surveys, it seems that there were technical problems with the Operabrowser. Opera users had problems when loading the new page containing the next questionafter pressing the next button, because the page with the already answered question was cachedand not newly loaded since the URL did not change3. The completion rate is significantly lowerthan with other browsers. This is awkward, but, because of a low proportion of respondentsusing Opera, it is not critical for further analysis.

3This behavior is reported in several forum entries and solved for the actual version of QSYS

90


14.3 Overall Screen Resolution Distribution

In addition respondents’ screen resolutions were extracted. The vast majority of all participantsused 1024x768, 1280x800 and 1280x1024. No noticeable problems were found in any of the threesurveys. Table 14.3 shows the most common resolutions for all three surveys. Unfortunately,for the tourism survey, technical problems prevented tracking of such paradata for about half ofthe respondents.


1024x768 149 365 84 169 35 781280x800 66 175 37 49 17 231280x1024 76 117 72 140 36 38

Table 14.3: Screen resolutions as used by the respondents for all three surveys

14.4 Overall Distribution of Additional Browser Settings

The technical parameters of the client’s machines were also tracked similarly as described insection 14.3. This information was read with Javascript and ran within the client’s browser. Itshould be mentioned, that when reading out the flag Java enabled within the client’s browser,this did not automatically mean that Java worked properly on the client’s machine. A betterway to determine the correct functioning of Java is to run a small invisible Applet within theWeb page, which supplies information on whether initialization was successful or not4.

In the tourism survey 1.3% did not have Cookies enabled, 2.7% disabled Java (or the machinehad no Java installation) and 1.3% turned off Javascript. The snowboard survey had higherrates of technical preconditions disabled: 2,3% with no cookies, 3% with no Java and 2,3% withno Javascript. These are higher rates concerning Java compared to other online surveys such ase.g. reported by Couper et al. (2006), where 1,7% did not have Java enabled. Table 14.4 showsall additional browser settings for all three surveys at a glance 5.

tourism webpage snowboardtechnical condition n.c. c n.c. c n.c. c

false 7 8 — — 4 3cookies enabled

true 346 753 235 420 115 173false 14 17 — — 4 5

java enabledtrue 339 744 218 406 115 171false 7 8 — — 4 3

Javascript enabledtrue 346 753 236 420 115 173

Table 14.4: Distribution of browser settings for all three surveys (c. = completed; n.c. = notcompleted)

4This strategy is implemented in the current version of QSYS5Again, the temporary technical problems the webpage survey had, set all technical conditions to false, that’swhy for this survey this information is not available.

91


14.5 Summary and Conclusion

These figures indicate the heterogeneity of the used combinations of browsers and browser set-tings, together with the different operating systems. Except for troubles with the Opera Webbrowser, no technical problems can be implied from the frequency tables given above. Only fewturned off Javascript and Java in their browsers, but even these cases were not lost for furtheranalysis because were assigned different input controls which work without these technologies.Only few turned cookies off within their browsers. Cookies are used to store a session variable,which is needed to assign the answers given to a certain question. Disabling had no influenceon the proper functionality of the survey (because URL-rewriting6 was enabled). Paradatatracking in general is an important advantage of online surveys compared to paper and pencilquestionnaires. It would be possible to extend the tracking abilities of the software for furtherexperiments.

6When using URL-rewriting, the session id was encoded within the URL for each request

92

15 Demographic Information

Different demographic information was collected for all 3 surveys. Unfortunately, in all sur-veys, this information was only collected at the end, so it is not available for respondents whodropped out before. It is true that the aim of the experiments was not to make statementsabout e.g. students of Salzburg University in general, but it would make sense to know aboutthe demographic structure of the sample used for this evaluation (e.g. young people may behavedifferently when working with a VAS). For example, Stern et al. (2007) mentions that findingsconcerning the visual effects are limited to their samples. Since many existing studies have usedrandom samples of college students1, it would make sense to take a look at differences betweenthe students’ and the employees’ responses within the tourism survey.

Concerning all results presented in this chapter, the number of participants in general differsbetween tables due to dropout. In other studies, dropouts were erased from the dataset. Datacleaning was not carried out for the experiments in this thesis to increase the number of cases.

15.1 Tourism Survey

Not only students from Salzburg University (n=678) participated in the survey, but also a smallnumber from Innsbruck University (n=88). Faculty and field of study were also asked, but thesevariables were not included in the analysis here.

About 80% of all respondents specified Austria as their country of origin. Of those who hadother countries of origin besides Austria, 116 came from Germany and 19 from Italy. About75% of all respondents were female. The mean value of age was 24.34 and for semester 6.42. Tosee the distribution of both, take a look at the boxplots in figures15.1 and 15.2.

1See chapter 4

93


Figure 15.1: Age distribution of respon-dents - tourism survey

Figure 15.2: Semester distribution of re-spondents - tourism survey

About 85% filled out the questionnaire at work (here it is not clear if students answered at workwhen at the university because university was not available as an answer option).

Another interesting consideration was the interviewee’s mood during the course of the filling outprocess. The respondents were questioned in regard to their mood at the beginning and at theend of the questionnaire and answered in the form of 5 smiley’s showed in a list. A screenshotof this question is given in figure 15.32.

Figure 15.3: Smiley’s used for indicating mood of interviewees - tourism survey

2For more information on research concerned with the influence of mood as a possible determinant of respon-dent’s behavior refer to Bachleitner & Weichbold (2007)

94


These variables could also be combined with the experiment’s scale variables. In table 15.1, theabsolute change of mood from beginning to end of the questionnaire for each scale control isgiven (1 means that mood improved one step, -1 means that mood degraded one step):

controls -4 -3 -2 -1 0 1 2radio 0.00 0.00 0.13 1.31 14.06 2.23 0.26button 0.00 0.00 0.13 1.58 13.67 3.29 0.26click-VAS 0.00 0.00 0.26 1.58 12.75 2.37 0.00slider-VAS 0.00 0.00 0.00 1.31 10.12 2.10 0.53text 0.13 0.00 0.13 1.18 11.04 1.71 0.13dropdown 0.00 0.13 0.00 1.97 12.88 2.37 0.39

Table 15.1: Overall portion of Mood changes for all controls - tourism survey

No significant differences could be found concerning mood changes when comparing the differentinput controls.

15.2 Webpage Survey

In this survey both students and employees were invited to participate. Table 15.2 provides anoverview of the distribution of respondent’s relation to the university (e.g. student, employee,alumni). 96 respondents were both students as well as employees. 30% of the respondents wereemployees and 65% were students.

relation finishedemployee 299student 636partner 4external 7alumni 28

Table 15.2: Respondent’s relation to the university (multiple answers possible) - webpage survey

Additionally the distribution of demographic information (age and gender) for the webpagesurvey is shown in tables 15.3 and 15.4. From those who finished the survey, a higher rate ofmale respondents can be found in the employees category compared to students (57% vs. 49%).Furthermore, employees are significantly older than the students are.

gender finished employees studentsmale 424 172 310female 421 132 327

Table 15.3: Gender distribution - webpage survey

95


Figure 15.4: Age distribution of respondents - snowboard survey

age finished employees studentsunder 18 4 3 218-25 441 49 42426-30 174 76 13431-40 130 92 6241-50 51 47 451-60 26 24 2over 60 15 9 6

Table 15.4: Age distribution - webpage survey

15.3 Snowboard Survey

Only a few demographic questions were asked, namely gender, origin and age: 129 were maleand 47 female. 130 participants were from Germany, 32 from Austria and 7 from Switzerland.The age distribution of the respondents can be seen in figure 15.4. The mean value of age inthis survey is 22.33.

96


15.4 Differences in Demographic Distributions Across Browserand OS-Versions

According to Funke & Reips (2005), there are differences in the demographic distributions of therespondents according to the Web browser used. For example, a higher percentage of woman useInternet Explorer compared to alternate browsers (75% versus 44%). Furthermore, it was foundout that the people using alternate browsers had a higher level of education. Even responsetimes are different when comparing Internet Explorer and alternate browsers.

The most clear difference in all three surveys concerning demographic differences between cer-tain browser users is the difference between men and women. As reported in Funke & Reips(2005), a far higher percentage of Internet Explorer users are female compared to those whouse an alternative browser (tourism: 82.43% vs. 64.80%; webpage: 57.43% vs. 37.28%; in bothsurveys findings were highly significant). A similar effect was found when comparing the usersof operating systems. Women used Windows more frequently than any other operating system:tourism: 77.19% (Windows), 42.31% (Mac), 14.29% (Linux ); webpage: 50.95% (Windows),40.00% (Mac), 25.00% (Linux ). Concerning age, no significant differences in regard to browserand operating system usage was found.


In all three surveys, the majority of the participants were students. However in the webpagesurvey employees and alumni of the university (65% of the respondents of this surveys were stu-dents) also participated. This has the consequence that respondents were relatively young (themajority is between 20 and 25 years). Respondents were equally distributed concerning gender,except in the snowboard -survey, where the majority were male (73%). In the tourism-survey,the current mood of the respondents was asked at the beginning and the end of the survey.It was seen, that the input control types did not have any influence on mood changes. Whencomparing demographic differences in the use of browsers and operating systems, it was foundthat female respondents had a higher tendency to use Internet Explorer and Windows.

It is always important to take a look at the demographic distribution of the respondents inorder to be able to know for which population the results of the experiments are valid andapplicable. Because the majority of the respondents are relatively young and strongly relatedto the university, it would be interesting to repeat the experiments with older people.

97

16 Feedback Questions / SubjectiveEvaluation

At the end of each survey, respondents were asked to answer three feedback questions so asto get a direct evaluation of the different input controls. Because respondents did not knowabout the experiments, the input controls were not explicitly mentioned in the questions to beevaluated, but referred to the appearance of the questionnaire as a whole. The strategy was toget a more general impression and not a rational evaluation of the scale control elements. Thequestion wording was slightly different in all 3 surveys (translated from German here), whichcomplicated comparison. Generally, no significant differences between the scale controls couldbe found when taking a look at the snowboard survey; for this reason, no results are reported foranswers to this questionnaire. To answer these questions, the assigned input control was used.To enable comparison to those controls with more than 10 scale points, linear transformationwas applied.

16.1 Boring vs. Interesting

The idea behind this question was to get an impression how innovative and interesting a certaincontrol is rated. The concrete wording was the following (translated):

• tourism, snowboard : how would you evaluate the online questionnaire itself? Unfortu-nately the formulation of this question was not very precise, so maybe some respondentsmisinterpreted the meaning of the question (possibly some could have meant the contentof the questionnaire, especially for the first sub question).

• webpage: how did you like the design and technical realization of the online questionnairecompared to other online questionnaires? This question refers to the input controls andtherefore more clear effects were expected for this survey.

Within the webpage survey, when asking how interesting the design of the questionnaire was,the following conspicuities can be found: dropdown and text control are more often classifiedas boring than other controls, followed by the radio control. click-VAS, slider-VAS and buttoncontrols are more likely to be rated as interesting. This means, that questionnaires with customcontrols are more interesting to the respondents than those with well known standard controls.This finding is highly significant and was verified with a Kruskal-Wallis test applied on pairwisecomparisons. Here the differences in mean values for the input controls are presented andcompared with the overall mean in descending order: click-VAS 0.99, slider-VAS 0.94, button0.61, radio -0.15, text -0.37 und dropdown -0.42. Highly significant differences (p < 0.001) wereobserved between controls click-VAS and slider-VAS vs. radio, text and dropdown. Highlysignificant differences could be found between button vs. text and button vs. dropdown. Thesame effect was found in the tourism survey, even though the effect was not as clear-cut as in the

98

16 Feedback Questions / Subjective Evaluation

Figure 16.1: Comparison of results of feedback question 1 (boring=1 vs. interesting=10) - web-page survey

webpage survey1. Here button and click-VAS controls have a higher interest-rate. Subsequentlythe differences of overall mean is given in ascending order: button 0.42, click-VAS 0.19, slider-VAS 0.00, radio 0.00, text 0.04 and dropdown -0.60. For a graphical illustration of these results,take a look at figures 16.1, 16.2 and 16.3.

16.2 Sufficient vs. Non Sufficient Number of Scale Intervals

To ask the respondents, if they feel that the number of scale intervals presented (dependenton each input control) were sufficient or not, the following two anchor points were presentedwhereby the respondent had to give an evaluation between these extreme points (translated):scaling of the rating questions is sufficient vs. scaling of the rating questions is not sufficient.

Interestingly, no tendency could be observed that controls with more than 10 intervals (namelyclick-VAS and slider-VAS ) were evaluated as having sufficient intervals. Nevertheless, the overalldifferences were significant (Kruskal-Wallis rank sum test;Kruskal-Wallis χ2=22.42, p<0.001).The detailed results indicate that for those controls, where the scale points could be seen onthe first view (either through labeling or because the spaces between the items allowed countingthem easily), the respondents tended to answer that these scale points were sufficient. Com-pared to overall mean, dropdown was nearest to the sufficient-extreme with -0.30 (tourism) and-0.32 (webpage) on the other extreme click-VAS (with 0.29 (tourism) and 0.72 (webpage)) andslider-VAS (with 0.10 (tourism) and 0.39 (webpage)2).

1Which is a possible result of the imprecise wording2Differences between these extremes are highly significant

99


Figure 16.2: Comparison of results of feedback question 1 (boring=1 vs. interesting=10) -tourism survey

Figure 16.3: Line diagram comparing results of feedback question 1 (boring=1 vs. interest-ing=10) for all three surveys

100


Figure 16.4: Comparison of results of feedback question 2 (sufficient=1 vs. insufficient=10 num-ber of scale points) - webpage survey

The reason why those who had the dropdown input control assigned were the ones who gave thebest score for having sufficient number of intervals probably has to do with the way intervalswere selected. Possibly the respondent thought that it would not be feasible to put more than 10intervals into the dropdown box. Concerning the results of click-VAS and slider-VAS, it is hardto determine where the origins of this phenomenon lies, but these are the only two scale typeswhere the number of scale points could not be found easily, because no feedback was given andno item counting was possible, so it was possibly not obvious to the respondent which scalingwas meant, because the control appeared as a continuum. For a graphical illustration of theseresults, take a look at figures 16.4, 16.5 and 16.6.

16.3 Easy to Use vs. Complicated

The best evaluation regarding ease of use was given from those who were assigned the radioinput, followed by click-VAS and button. Here the differences of the overall mean (webpage:2.70; tourism: 1.91) are given: -0.65, -0.65, 0.53 (webapge); -0.29, -0.14 and -0.06 (tourism).Those who got the worst evaluation regarding ease of use were slider-VAS and text input control,followed by dropdown. Overall differences between input controls were highly significant. For thetourism survey, no significant differences could be observed, most likely because of bad questionwording. For a graphical illustration of these results, take a look at figures 16.7, 16.8 and 16.9.These results correlated with the necessary steps involved in filling out a question and a certaininput control (see table 11.2).

101


Figure 16.5: Comparison of results of feedback question 2 (sufficient=1 vs. insufficient=10 num-ber of scale points) - tourism survey

Figure 16.6: Line diagram comparing results of feedback question 2 (sufficient=1 vs. insuffi-cient=10 number of scale points) for all three surveys

102


Figure 16.7: Comparison of results of feedback question 3 (easy=1 to use vs. complicated=10)- webpage survey

Figure 16.8: Comparison of results of feedback question 3 (easy=1 to use vs. complicated=10)- tourism survey

103


Figure 16.9: Line diagram comparing results of feedback question 3 (easy=1 to use vs. compli-cated=10) for all three surveys


Custom controls are, as expected, more likely to be rated as interesting than standard HTMLcontrols. When asking whether the input controls offered sufficient or non-sufficient number ofscale points, no consistent tendency was found. The problem here possibly lies in inaccuratewording of the question, so the respondents did not exactly know what was meant with thequestion. It would also be possible to repeat the experiments with clearer statements, possiblyeven mention the intention of the experiments at the end of the questionnaire when the inputcontrols should be evaluated.

The third question aimed at evaluating the ease with which the different input controls couldbe used. Best results were achieved by radio, click-VAS and button. This result correspondedwith the steps required to give the answer, and therefore was not surprising.

Controls such as dropdown and text were rated badly in both feedback question one and three.Therefore, input controls of this kind should be avoided if possible. The only meaningful ap-plication for these two input controls would be scales with many (labelled) items on the scale,which was not part of this thesis.

104

17 Response Time / Completion Time

In this section, a comparison of the controls in regard to completion time is given. The responsetime or completion time is available for each question and was tracked on the client side. Thetime needed for filling out is an important factor, because it gives evidence of burden whichcould lead to dropout. Time values could only be tracked when Javascript was enabled in theclient’s browser.

17.1 Standardization

Unfortunately, for questions containing multiple sub questions, the time needed to fill out a singlesub question could not be determined. For this reason, filling response time standardized bythe number of sub questions + 11. This standardization enabled better comparability betweenquestions, but it must be mentioned that this approach was just an approximation of the realvalues. The real values were very hard to determine because each question was different (e.g.different length of the question text, some questions needed more time to think about,...). Inthe following, when talking about a question, a subquestion is meant.

17.2 Outlier Detection

As a first step when analyzing the durations needed for questions under experiment, outliershave to be detected and eliminated (or weighted). See tables 17.1, 17.2, 17.3 for some statisticalparameters which refer to all respondents (outliers were included; all questions were standardizedby the number of sub questions + 1):

position mean median sd min max6 6.89 5.61 7.29 0.31 160.129 5.35 4.39 4.94 0.18 76.4012 8.70 6.96 10.72 0.15 239.8813 9.01 7.54 7.60 0.15 138.8615 8.18 6.99 6.86 0.16 117.2616 5.26 4.42 4.99 0.08 110.2117 5.87 4.79 6.13 0.15 98.8836 8.21 6.72 7.58 1.36 122.3542 6.77 5.93 6.98 1.72 123.7243 6.27 5.53 3.21 0.57 35.69

Table 17.1: Basic parameters concerning duration - tourism survey (in seconds)1Which stands for the question text of the whole question itself.

105


In the tourism and webpage surveys, respondents were detected who had response times for theexperimental questions that were below 0.1 seconds2, which was not realistic. This could arisefrom technical difficulties or from respondent’s use of an automated form filling function, as e.g.is offered by the Web developer plug-in for Firefox, which is the most reasonable explanation,because equal fill out patterns were found for these cases. These respondents were removed fromfurther analysis.

position mean median sd min max2 16.27 11.71 38.18 2.52 1064.423 8.05 6.44 10.20 0.36 227.254 13.61 10.08 20.18 0.44 515.275 8.42 6.92 9.32 0.94 184.156 10.36 7.61 17.47 1.02 476.788 13.21 9.76 29.53 1.71 820.6610 12.42 8.03 48.35 1.53 1150.4312 11.46 7.33 13.16 0.26 171.0814 15.91 11.88 14.38 1.67 165.6116 9.73 6.72 13.48 1.12 223.9724 17.61 10.85 26.87 2.98 362.8930 8.44 7.28 8.38 0.35 147.81

Table 17.2: Basic parameters concerning duration - webpage survey (in seconds)

position mean median sd min max10 6.55 5.17 5.66 1.29 54.6311 4.79 3.89 3.35 0.82 27.5114 4.39 3.95 2.03 1.66 14.7121 5.00 4.51 2.07 1.01 14.2822 10.98 8.77 9.22 1.09 79.0623 7.46 4.00 19.00 1.01 246.7024 5.29 4.57 3.20 1.08 25.3225 6.51 5.86 3.27 1.08 27.9526 4.48 4.04 1.92 0.78 16.4527 4.69 3.71 4.01 1.06 31.4628 7.43 6.09 5.16 1.20 47.5329 10.15 5.65 46.91 1.11 626.4231 10.20 8.99 4.94 2.39 32.4135 5.82 5.32 2.45 2.60 14.5542 6.36 5.99 2.60 0.64 22.93

Table 17.3: Basic parameters concerning duration - snowboard survey (in seconds)

Outliers could occur as a result of e.g. respondents leaving the browser window with the surveyopen, but turning to another task and continuing later. Also those who just clicked through

2For details, take a look at the minimum values in table 17.1, 17.2 and 17.3

106


the survey without seriously answering the questions should be excluded, which means outlierdetection should be carried out on both sides, those who took too long and those who did nottake long enough. In literature, different strategies of excluding outliers before completing timecomparisons can be found:

1. Absolute time: when filling out questions takes longer or shorter than an absolute timeborder given (e.g. longer than 1 hour and shorter than 5 minutes for the whole question-naire). This strategy was chosen e.g. for experiments by Heerwegh & Loosveldt (2002a),Healey (2007), Tourangeau et al. (2007) and Tourangeau et al. (2004, p.382), who gavetheir experiments an absolute time limit for filling out process of the entire questionnaire.

2. Relative time: several questions are excluded when they take the same respondent muchlonger to fill out than other questions (also the number of sub questions or even the numberof characters per sub question text or the level of difficulty of the question can be taken intoconsideration), e.g. Couper et al. (2006, p.242): “On a particular question, if a respondentspent more than six times the average time he or she spent on the other questions, thisresponse time was removed from the analysis”. The major problem for this strategy is thatin some cases it is hard to compare questions with each other concerning their expectedfilling out time.

3. Compare to the time needed by other respondents and exclude those e.g. for Heerwegh(2002, p.15). “Outliers at the item level were defined as response times smaller than meanminus 2 times standard deviation, or greater than mean plus 2 times standard deviation”.

4. All observations falling above (or below) the 75 th percentile (or 25 th percentile) of thetime spent on any given screen or question ± 1.5 times the interquartile range of the timespent on the screen (Crawford et al. (2001, p.161)).

Because questions from different experiments are hard to compare with each other, e.g. whena different number of items exist with different question text length, strategy 2 may not be theright choice. This is equally true for strategy 1: it is hard to define upper and lower borders toextract the outliers when questions differ in the criteria mentioned before.

Figure 17.1 provides an example of outlier detection when applying strategy 3 (± 2 * standarddeviation). The figure shows a density plot of a question from the tourism survey3 together withthe detected outliers. There is no visible cut-off or gap between those identified as outliers andnon-outliers, the boundaries between them are constantly in a state of flux: as an average of allquestions, 16.3 outliers were removed from the tourism survey, 17.1 from webpage and 5.6 fromsnowboard. As can be seen in this figure, there was even an outlier who completed the questionafter about 1.5 minutes, in other questions there were responses for one block after more than2 minutes. Another problem of this strategy became obvious: Because all time measures wereright skewed, no exclusion concerning those whose completion times were too short was carriedout. Those who only clicked through and marked answer alternatives at random, should alsobe filtered out, which could easily be achieved with outlier detection on the left side of thedistribution. This is an advantage of Web based questionnaires as in other modes, this detectionwould not be possible4. Secondly, the border chosen for the right side is relatively arbitrary.

3Question number 17, again standardized by subquestion + 14See also Funke & Reips (2007a, p.62f)

107


Figure 17.1: Sample density plot of question number 17 with outliers - tourism survey

The extraction of the real outliers is an impossible task, because it can never be assessed whetherthe respondent is currently busy with the questionnaire or e.g. with responding to an e-mail.Even employing Javascript to check if there are simultaneous activities on the computer wouldnot give any clarification. It is only possible to track activities inside the browser; furthermore,no computer activity, such as mouse moving, would not necessarily mean that the respondent isnot working with the questionnaire at the moment, e.g. during the phase of question comprehen-sion5. To make sure that no technical difficulties or other influences affecting the controls wereresponsible for the outliers, the distributions of the outliers concerning input controls were alsochecked. When doing so, no noticeable problems were found. But it should be taken into con-sideration that it is dangerous to exclude outliers in an automatic manner, e.g. Faraway (2005,p.68f). As an alternative method when comparing durations between the different controls,using the robust regression methods would be a good choice in this case, because estimators arenot so strongly affected by the outliers.

17.2.1 Robust Regression

Detecting outliers the usual way is a procedure which either accepts or rejects items. Robustregression follows a different approach according to Rousseeuw & Leroy (2003, p.8): “it is bylooking at the residuals from a robust (or resistant) regression that outliers may be identified,which usually cannot be done by means of the LS (least squares) residuals”. Therefore, diag-nostics and robust regression really have the same goals, only in the opposite order: when using

5See more about the phases of the response process in chapter 10.1

108


diagnostic tools, one tries to delete the outliers and then to fit the good data by least squares,whereas a robust analysis first wants to fit a regression to the majority of the data and thento discover the outliers as those points which possess large residuals from that robust solution”.When using robust regression, weights are determined for each item which can be used as afactor for further analysis. The most common function for performing robust fitting of linearmodels in R is applying the rlm-function, whereby fitting is done using an M estimator withthe Huber method. This method is well described in Faraway (2005, p.98ff). Fitting is doneby iterated re-weighted least squares (IWLS). For weighting, all time measures are taken intoconsideration, as well as the dependent variable, which can be seen as a benefit of this procedure.

As a demonstrative example, take a look at figure 17.2, which presents a sample question batterytaken from the middle section of the tourism survey. Colored marks show the weights of thedifferent input controls, and it can be seen that the weights are calculated differently betweencontrols. It can also be seen that there is no direct association from duration to weight. Theweights in this example are only based on a single regression model with the logarithm of theduration of only one question and the control type.

Figure 17.2: Density plot with weights of question number 17 - tourism survey

To quantify the differences, the results of the comparison of the durations for this question afterremoving the outliers6 can be seen in figure 17.3 and table 17.4. Although many outliers wereremoved, differences between the controls concerning completion time are relatively clear. Thepatterns which can be extracted from these two sources can also be found within the otherquestions. When comparing to the overall mean (5.034 seconds) and median (4.71 seconds) forthis question three groups can be found:

6All weights smaller than 0.5 were eliminated, which were 38 items in this case

109


1. text and dropdown are more than 0.4 seconds slower than average.

2. slider-VAS lie in the middle close to the mean value.

3. button and radio are faster than average. The fastest control for this question was click-VAS, but this control was added to this group because time measurements for this controllay closer to button and radio in the other questions.

Figure 17.3: Boxplots comparing the response times across input controls for question 17 of thetourism survey

control mean medianbutton 4.74 4.53dropdown 5.70 5.32slider-VAS 5.05 4.82click-VAS 4.37 3.96radio 4.95 4.55text 5.47 4.95

Table 17.4: Basic parameters concerning duration for question 17 of the tourism survey (inseconds)

The findings of Heerwegh & Loosveldt (2002a), when simply comparing radio button and drop-down menu, were confirmed with the experiments reported here. In some other experimentsreported7, faster download times for dropdown boxes were observed, which obviously had noeffect in these experiments. When taking a look at average speed of internet connections, the

7E.g. Heerwegh & Loosveldt (2002b, p.1)

110


minimal increase of HTML code8 produced by radio buttons did not have an effect.

A summarized overview of all three surveys is given in table 17.5, which contains the same resultsas the detailed view of one single question above. In this table, multiple robust regression wasused with control type and question number as independent variables and the logarithm of theduration as a dependent variable for each survey. Pairwise comparisons were carried out usingTukey’s HSD-Test9. All highly significant results with the direction are listed in the table: a “+”- sign means that the control standing in the row took longer to complete compared to those inthe column (e.g. dropdown was slower than button) and a “-” - sign means that filling out tooklonger with the control in the column.

button dropdown slider-VAS click-VAS radio textbutton -|-|- -|-|. .|-|+ .|.|+ -|-|-dropdown +|+|+ +|.|. +|.|+ +|+|+ .|.|.slider-VAS +|+|. -|.|. +|+|+ +|+|+ -|.|.click-VAS .|+|- -|.|- -|-|- .|.|. -|.|-radio .|.|- -|-|- -|-|- .|.|. -|-|-text +|+|+ .|.|. +|.|. +|.|+ +|+|+

Table 17.5: Overview of duration comparisons for all three surveys

17.3 Learning Effect

Figure 17.4 offers an example of a learning effect, which was calculated for the tourism survey.The effect is similar for the webpage survey (figure 17.5). These figures show the percentage ofeach input control on the overall time needed for filling out a certain question (e.g. when takingall mean values of a certain question, the percentage of the mean value of the text input controlfor question number 6 is 17%). Only questions under experiment are shown. Mean values wereused for this line plot; it should be shown whether the portion of sliders increase or decreaseover the course of the questionnaire.

For the non-custom input controls, especially for slider-VAS, van Schaik & Ling (2007, p.7) seesthe “difficulty in (learning to) use because of lack of indication of intermediate points (only endpoints are displayed)” as a disadvantage of VAS in general. This is one possible explanation,the second is that the two VAS controls are simply not very commonly used in Web pages andtherefore respondents need some orientation time (how the controls have to be handled). Anotherexplanation for the longer response times when using slider-VAS could be initial loading times10,but when testing the survey with this control, only minimal delays were observed, because theApplet containing the control was relatively small (it was simply programmed to select a valuefrom a slider and did not contain any additional program logic).

8As always, this depends on the way it is written9For more details, see Faraway (2005, p.186f)

10This was e.g. reported by Couper et al. (2006)

111


Figure 17.4: Percentage of time needed for each control per questions - mean values - tourismsurvey

Figure 17.5: Percentage of time needed for each control per questions - mean values - webpagesurvey


Outlier detection using robust regression generated better results than other approaches andserved as a good basis for further analysis. When comparing the response times of the inputcontrols, it was observed that text and dropdown take more time than slider-VAS and muchmore time than button, radio and click-VAS. This confirms the hypothesis that response timecorrelates with the number of steps necessary to answer the question. Because the response times

112


for the two VAS differ, it can be implied that VAS in itself does not influence the response timesand that the number of steps for answering are more important. Furthermore, the hypothesiscan be confirmed that response times for custom controls are initially (for the first questions)longer than for questions at the end when comparing them to the standard input controls. Thislearning effect is most obvious for the slider-VAS.

It would not make sense to give a general recommendation for input controls with faster responsetimes, because the other control options also have advantages. It possibly depends on the typeof question: if difficult questions are asked and there is time to think about the answers, thetime needed for filling out is not so important; if easy questions are asked, the chance to givequick answers and move on to the next question is possibly more important.

113

18 Dropout

Dropout in general occurs more often in Web surveys than in other survey modes1. Thereforeit is important to establish the possible factors which motivate respondents to quit the ques-tionnaire. It should be evaluated if the use of certain input controls can diminish dropout.Consequently, the assignment of certain input controls and the corresponding dropout rates willbe analysed. Ganassali (2008, p.25) gives a definition of the dropout rate: “The dropout raterepresents the frequency of the respondents who started the survey and finally did not end it.An exit after viewing only the first screen of the questionnaire is considered as dropout as well”2.

All questions in all three surveys were mandatory, so if a question was not filled out satisfac-torily, a soft prompt was displayed, which may have annoyed respondents and led to dropout.Unfortunately whether respondents received a prompt before quitting the questionnaire was nottracked (due to the fact that every question was mandatory, a message was displayed when onepart of the question was not filled out). These messages possibly increased respondent frustra-tion and thus led to survey termination.

A typical effect in online questionnaires is that dropout is more likely at the beginning of thequestionnaire than at the end3, which was also the case for all three surveys under experiment.When taking a look at all three Kaplan Meyer curves (figure 18.1,18.2 and 18.3) very highdropout occurred at the first question when a slider-VAS was used. This is probably due toeither technical problems or the high level of burden linked with using slider-VAS. In Walstonet al. (2006), a significantly higher dropout for sliders was reported, and Java technology wasalso used, which would support the theory that technical problems are the reason for higherdropout rates. Reips (2002c, p.248) shares this opinion: “Sophisticated technologies and soft-ware used in Internet research [...] may contribute to dropout. Potential participants unableor unwilling to run such software will be excluded from the start, and others will be dropped,if resident software on their computer interacts negatively with these technologies”. Funke &Reips (2008b) and Funke & Reips (2008a) did not report any technical problems during theirexperiments (as faced for the experiments described with Java) which could strengthen the as-sumption that the difficulties were based on technical difficulties and not on the use of VASitself. For these experiments, employing Javascript was sufficient.

For all further analysis, lurkers were excluded and only those included which showed initial coop-eration (which means those who filled out at least one question were included). Unfortunately,due to conceptual mistakes, it cannot be determined if dropout occurred at text blocks placedbetween the questions or at the successor question.

1E.g. Lynn (2008, p.41)2In the methodological theory section, there is a section on dropout in general (see section 9.1.2.2), whichprovides the theoretical basis for the subsequent analysis

3This was also observed by Weichbold (2005, p.221), Galesic (2006, p.317) and Hamilton (2004)

114

18 Dropout

To statistically evaluate this question, survival analysis was used4. In the following, KaplanMeyer survival curves are shown to compare the dropout rate between the input controls togetherwith tables where the dropout at the questions under control are quantified (the columns arethe positions of the question, where dropout occurred).

18.1 Tourism

In the tourism survey (see figure 18.1 and table 18.1), technical problems concerning the slider-VAS seem to be dramatic; only about 55% of the participants survived the first question underexperiment. Concerning the click-VAS, only 3 dropouts were observed (these occurred within thefirst two questions). This is the lowest rate of dropout especially compared to the other controls.This low rate of dropout was closely followed by button and radio controls (which behavedalmost identically within this survey with a survival rate of about 92%). Higher dropouts canbe observed when dropdown is used. Text control has the highest dropout rates, which narrowlyfails to reach statistical significance based on the cox-regression when comparing to the buttoncontrol. As expected, about 10% dropouts at the second question confirm previous findings thatmost dropout can be observed at the beginning of the questionnaire.

4For the usage of survival analysis in R (http://cran.r-project.org), see Everit & Hothorn (2007, p.143pp)

115

http://cran.r-project.org

18 Dropout

Figure 18.1: Survival times for the tourism survey comparing input controls

control/pos 6 9 12 13 15 16 17 16 42 43button 2.58 1.03 1.03 0.00 1.55 0.52 0.00 0.00 0.00 0.00dropdown 1.62 1.62 1.08 0.54 2.70 1.62 0.54 1.08 0.00 0.54slider-VAS 29.06 0.85 0.00 2.14 2.14 0.43 0.43 0.00 0.00 0.00click-VAS 1.25 0.00 0.00 0.00 0.62 0.00 0.00 0.00 0.00 0.00radio 0.56 0.56 1.13 1.13 2.82 0.00 0.00 0.00 0.00 0.00text 0.00 3.66 1.22 1.22 3.05 1.22 1.22 0.00 0.61 0.00

Table 18.1: Dropout questions - tourism survey (in percent for each control)

116

18 Dropout

18.2 Webpage

The webpage survey presented a similar situation (see figure 18.2 and table 18.2). Button, radioand click-VAS were nearly identical. Highly significant difference from the button control canbe observed for text (p<0.01) and java-VAS controls. There was also a significant differencebetween button controls and dropdown controls (p<0.05).

The largest dropout rate for the middle section of the questionnaire was observed when respon-dents reached question 12. Specific for this question was that each of the eight sub questionswas linked to a screenshot. One reason for high dropout may have to do with the time intensiveprocess of clicking on each link to reach the screenshot and then the necessity of returning backto the questionnaire to rate the image. Another reason may be that respondents were confusedbecause multiple browser pages were open. Interestingly, the highest dropout rate occurred whenthe text input control was used. It is possible that the likelihood of dropout increases when en-tering answers is made more complex through the implementation of an additional input device(namely: the keyboard).

117

18 Dropout

Figure 18.2: Survival times for the webpage survey comparing input controls

control/pos 2 3 4 5 6 8 10 12 14 16 24 30button 4.07 4.07 0.81 1.63 1.63 0.81 0.00 7.32 0.81 1.63 0.00 0.00dropdown 4.87 5.48 1.42 3.85 1.22 1.42 0.81 9.33 1.01 1.42 0.00 0.61slider-VAS 19.71 3.65 2.19 2.19 0.00 0.73 0.73 6.57 0.00 3.65 0.73 0.73click-VAS 3.76 4.51 0.75 0.00 1.50 1.50 2.26 4.51 2.26 1.50 0.00 0.00radio 3.15 5.51 0.00 2.36 0.79 0.00 0.79 6.30 0.00 2.36 0.00 0.00text 6.87 3.55 1.77 3.55 3.10 1.11 0.44 16.85 1.11 1.77 0.44 0.22

Table 18.2: Dropout questions - webpage survey

118

18 Dropout

18.3 Snowboard

Similar behavior can be observed for the snowboard survey (see figure 18.3 and table 18.35).Again, click-VAS shows the best results.

Figure 18.3: Survival times for the snowboard survey comparing input controls

5Missing columns here only contain zeros

119

18 Dropout

control/pos 10 11 14 21 25 27button 5.00 3.33 0.00 0.00 0.00 0.00dropdown 5.88 5.88 0.00 0.00 0.00 1.96slider-VAS 36.21 5.17 0.00 0.00 0.00 0.00click-VAS 5.66 3.77 0.00 0.00 0.00 0.00radio 4.26 4.26 2.13 0.00 2.13 2.13text 11.54 7.69 0.00 3.85 0.00 0.00

Table 18.3: Dropout questions - snowboard survey

In table 18.4, an overview of the outcome of a pairwise comparison between input controls forall three surveys is given. The signs in a cell stand for the three surveys (tourism, webpage,snowboard). A plus-sign means that the input control listed in the row led to a significantly(p<0.05) higher dropout (or a higher hazard-rate) in the particular survey compared to theinput control given in the column, a “-” - sign indicates the opposite and a dot should showthat no significant difference between input controls could be found. To create this table, acox proportional hazards model with all three experiments, was used and the non-experimentalvariables censored. Results for the slider-VAS are relatively clear in this table, but it is againimportant to mention that these effects possibly result from technical difficulties with the (Javabased) input control.

button dropdown slider-VAS click-VAS radio textbutton .|-|. -|-|- +|.|. .|.|. .|-|-dropdown .|+|. -|-|- +|+|. .|+|. .|-|.slider-VAS +|+|+ +|+|+ +|+|+ +|+|+ +|.|.click-VAS -|.|. -|-|. -|-|- .|.|. -|-|-radio .|.|. .|-|. -|-|- .|.|. -|-|.text .|+|+ .|-|. -|.|. +|+|+ +|+|.

Table 18.4: Overview of dropout for all three surveys - paired comparisons

Slider-VAS, dropdown and text led to higher dropout than the other controls. This stands incontrast to button, click-VAS and radio, which form the group of controls which lead to lowerdropout.

Finally, an overall overview on dropout rates6 for all three surveys is given in table 18.5. Thedropout rates were only counted as dropout in controlled questions and are given in brackets.Even though a big portion of dropouts occurs in controlled questions, there is also a (varying)portion of dropouts within uncontrolled questions, but which could also be affected by the inputtype of the antecessor questions. The table indicates that lowest dropout rates can be observedfor the controls: button, click-VAS and radio within all three surveys.

6How many of those who started the survey did not complete

120

18 Dropout

tourism webpage snowboardbutton 27.87% (6.71%) 26.84% (22.78%) 30.00% (8.33%)dropdown 29.17% (11.34%) 41.38% (31.44%) 39.20% (13.72%)slider-VAS 55.15% (35.05%) 44.53% (40.88%) 63.80% (41.38%)click-VAS 19.98% (1.87%) 31.56% (22.55%) 26.41% (9.43%)radio 23.70% (6.20%) 28.35% (21.26%) 31.94% (14.91%)text 34.77% (12.20%) 51.86% (40.78%) 61.56% (23.08%)

Table 18.5: Overview of dropout for all three surveys - dropout rates


The most obvious point which can be seen in the Kaplan Meyer curves is the high dropout forslider-VAS. One important research question was to check the usability of Java Applets withinquestionnaires. Because slider-VAS are based on this technology, it can be implied that the useof advanced technologies like Applets is at least critical, because it seems that some respondentsdid not have Java installed properly. Therefore avoiding these technologies is recommended,unless it is possible to ensure the installation and configuration of the respondent’s PCs, whichwould e.g. be the case for employee attitude surveys.

When looking at tables 17.5 and 18.4, similarities apart from slider-VAS can be found, whichmeans that there is a slight correlation between the time needed to complete the filling outprocess with a certain control (and the number of steps necessary for filling out) and the dropoutrate. This shows that a higher burden for the respondents directly results in higher dropoutrates, which confirms one of the hypotheses of this work. Controls with smallest dropout wereclick-VAS, button and radio.

121

19 Response Distributions

A central question is if there are differences in the response distributions themselves acrosssliders. For all these analyses, the evaluation questions at the end of each survey1 were notincluded. In the following, comparisons of the means across input controls were carried out aswell as examining the distributions of the answer categories themselves.

19.1 Comparison by Mean Values

As a first step, a comparison of the mean values across input types was prepared for all threesurveys. To enable comparison, panel normalization2 was applied to all input control values,which resulted in values between 0 and 1. Each question was treated separately. Table 19.1contains the mean value of differences from the overall mean for each question. When takinga look at these values, no consistent trend can be found. In the tourism-survey (and similarlyfor the webpage-survey), dropdown tends to the left pole. Slider-VAS has a tendency to theright pole in the webpage-survey. The results of a MANOVA for the tourism- and webpage-survey confirm these findings and bring more clear results: dropdown had significant differences(p<0.05) to all other scale types3 with a tendency to the left pole (highly significant differences(p<0.01) for click-VAS and button in both surveys). Text has a similar tendency compared tobutton and radio (tourism), same as click-VAS and slider-VAS (webpage).

tourism webpage snowboardbutton 0.0175 0.0044 0.0045dropdown -0.0180 -0.0156 -0.0035slider-VAS 0.0030 0.0246 0.0054click-VAS -0.0055 0.0133 -0.0158radio 0.0106 -0.0130 0.0282text -0.0076 -0.0136 -0.0188

Table 19.1: Deviations from mean per sub-question (unit: normalized (0-1) scale point)

19.2 Compare the Distributions within the Categories

To get a deeper understanding of how response behavior differs in the certain answer categoriesbetween controls, further analysis was conducted. As an initial data analysis step, paired differ-ences between the input controls for each answer category were analyzed by creating contingency

1See chapter 162Vstand = Vi−Vmin

Vmax−Vmin3Except radio, where a strong difference (p<0.001) was only observed in the webpage-survey

122


tables for each case, which means for each column there were two input controls and for eachrow items were selected or not selected.

All crosstabs with values smaller than 5 were eliminated. For each of these tables, a χ2 valuewas calculated and those with high significant differences were taken and summarized. See ta-ble 19.2 for an example of one concrete subquestion in which two input controls were comparedfor one category. This crosstab would result in a p-value of 0.0052 using Fisher’s Exact Test.This would add a point to the score, because the difference is highly significant.

To express the direction of the difference, odds ratio was taken as measure4. An odds ratiogreater than one means that there is a tendency for the control shown above to be selected moreoften, compared to the control given below. In the example in table 19.2, the resulting oddsratio of 3.09 would mean that there is a tendency that dropdown is selected more often for thiscategory. The results are summarized in figures 19.1 and 19.256.

style selected not selecteddropdown 25 101slider-VAS 9 113

Table 19.2: Example of a 2x2 table for questionnaire tourism, question 9, sub question 11, com-parison of dropdown and slider-VAS, category 5

4The mean value of odds ratios for a certain contingency table with high significant differences5The snowboard-survey had too few observations to get meaningful results, so results will not be shown.6As a remark, slider-VAS and click-VAS were abbreviated to slider and click to save space within the figure.

123


Figure 19.1: Significant differences between input controls for each scale item - tourism survey

124


Figure 19.2: Significant differences between input controls for each scale item - webpage survey

Figures 19.1 and 19.2 show the number of highly significant differences in the tables describedin table 19.2. The odds ratio is given in brackets for each category. The results of these twosurveys are not equal in some points, but go in the same direction. The most obvious differencescould be found when comparing text and dropdown with the other controls particularly at themidpoint (one has to remember that text and dropdown are the only two input controls wherenumerical feedback is given). When there is no real midpoint available, the impending questionis: which category is more likely to be selected, 5 or 6? On examining the tourism survey,it became clear that for these two controls, 5 is more often selected as midpoint if numericalfeedback is given. Similarly in the webpage survey the results showed that 6 as scale item wasselected more often when no numerical feedback was given.

125


19.3 Midpoint Selection

To examine the preferableness of scale item 5 over 6 for the input controls with the numericfeedback, the ratio of use of 5 over 6 was calculated and compared across the input controls.See table 19.3 for median values of the 5 over 6 ratios for each of the three surveys. In the tableit is easy to see that for dropdown and text input controls there was a higher use of scale item5 over 6. Additionally, for the webpage survey, there was a general preference for scale item 5.In relation to other input controls, the slider-VAS had a higher use of scale item 6, becauseinitial movement was necessary and respondents possibly moved the slider (mostly in writingdirection) and so out of the (linear reduced) scale item region 5 to region 6. A similar approachcan be found in Couper et al. (2006, p.239) who compared radio buttons with text input andfound that respondents tended to choose 10 over 11 when using the text input field on a scaleranging from 1 to 20. One possible conclusion would be that the use of a real midpoint makessense when working with scales.

tourism webpage snowboardbutton 0.69 1.67 1.00dropdown 1.50 2.47 2.00slider-VAS 0.64 1.24 1.00click-VAS 1.00 1.20 1.38radio 0.91 1.57 1.00text 1.25 2.67 2.00

Table 19.3: Median values of ratio of use of 5 over 6 for all three surveys, compared by inputcontrols

19.4 Analysis per Question Battery

In this section, the answer distributions of each battery were examined. The approach was asfollows: for each individual, certain parameters like span (max - min), min, max, standard de-viation and mean are calculated for each item battery and compared with each other by inputcontrol using analysis of variance with input control and battery (variable name) as predictors.For paired comparisons between input controls, the Tukey HSD test was used.

In the list of scale questions, such as the question batteries under experiment, the respondentssometimes tended to choose more or less the same points and only pick a very narrow range ofresponses from all the possibilities. This behavioural pattern is called use of subscales or non-differentiation. It shows a lack of interest and a weak level of effort for answering. Variety in theanswers should increase the degree of involvement in the survey (Ganassali (2008)). It should beshown if any differences between the different input controls concerning these definitions can befound. Concerning the range of answers (span) each individual had within one question battery,in the tourism and the webpage survey the radio input control had the smallest range. Thiswas significant (p<0.05) compared to the slider-VAS (which had the highest range in the linearreduced version) for the tourism survey and compared to the click-VAS (which had the highestvalue for this survey) and dropdown input control for the webpage survey. Additionally, in thissurvey, a significant difference between text input and click-VAS was found.

126


When facing the minimum values selected for one battery, the radio input control had the highestvalues in the tourism survey with significant differences when comparing to click-VAS, dropdownand text. In contrast, the results for the webpage survey were very different: radio had the low-est values together with the text input control. Here slider-VAS had the highest values withsignificant differences to text, radio and dropdown. The unequal distribution of input controlsassigned to respondents in the webpage survey heavily biased the results concerning minimum,maximum and span in general. This may be because controls which were underrepresented weree.g. more likely to have higher minimum values than other controls.


A simple comparison of the mean values across input controls showed no clear trend across inputcontrols, only that dropdown (and also more moderately text) had a tendency to the (lower) leftpole. This is possibly because of the need to move the mouse down to reach the (higher) rightpole. Similar results were found when analysing input controls per battery. It would make senseto repeat these experiments (possibly with a different population).

On examining the single categories, it was found that input controls with numerical feedback(such as dropdown and text) had category 5 selected more often. None of the input controlshad a real midpoint. However, when 10 scale points were used, 5 was often interpreted as themidpoint if numbers were shown on the scale. This becomes even more obvious when the ratioof categories 5 and 6 were calculated as is shown in figure 19.3. To avoid this effect, a realmidpoint should be used.

127

20 A Closer Look at the VAS Distributions

20.1 Distributions of the VAS

In this chapter, possible strategies for transforming the 200 scale items of the slider-VAS andthe 20 categories of the click-VAS control to 10 categories to enable comparability betweenthose controls are presented. Before categorization strategies are discussed, a closer look atthe distributions of the two VAS controls (compared to all other controls) might be useful. Infigure 20.1, bar plots showing the distribution of a summary of all variables of the tourism surveyare presented for the slider-VAS, click-VAS, and a summary of all controls with 10 scale points.The purpose of this figure should not be the direct comparison1, but to show special propertiesof the two VAS. Aggregation is suitable in this case to show the effects described below moreclearly.

1Which would not be advisable because of bias when taking the distributions of all variables together

128


Figure 20.1: Comparison of the categories slider-VAS, click-VAS and others - tourism survey

Here, it can be seen that the extreme values of the slider-VAS 1 and 200 were selected muchmore often than non-extreme values. It can be implied that this was done to make sure that onlythe most extreme category was selected or that categories would have been necessary which wereeven more extreme. Another effect which can be observed for the slider-VAS was that the pointsaround the midpoint categories were disproportionally seldom selected compared to values atthe right from the midpoint. One possible reason for this was that all questions were mandatoryand the slider was positioned in the middle. To distinguish between non-respondents and those

129


who really wanted to select the midpoint, the slider had to be moved initially. Possibly thosewho wanted to select the midpoint just moved the slider a bit away from the midpoint and didnot move back. Interestingly, more respondents moved the slider initially to the right, possiblybecause this is the reading direction. van Schaik & Ling (2007) conducted a similar experiment,where the VAS slider was also initially positioned in the middle, but no such effects were reported.

A similar, but not as intense effect concerning the higher selection of the extreme points can bealso seen within the bar plots of the click-VAS. The second point which can be seen for this inputcontrol is the need for a middle category (which does not exist for the 10 scale item controls).Categories 10 and 11 were disproportionally higher than their direct neighbours. All effectsmentioned above could be repeated for the other two surveys. To quantify these effects, seetable 20.1, where the ratios of adjacent categories are compared. For all three surveys, highestdeviations from 1 can be found at the extreme points and at the midpoint.

1/2 3/4 5/6 7/8 9/10 11/12 13/14 15/16 17/18 19/20tourism 1.43 1.05 1.18 1.03 0.46 1.46 0.97 0.87 1.05 0.60webpage 1.61 0.92 1.16 0.85 0.73 1.76 0.85 1.25 1.01 0.62snowboard 1.59 1.09 1.08 1.31 0.39 1.80 0.98 1.06 0.84 0.61

Table 20.1: Ratios of the adjacent categories for the click-VAS control

20.2 Categorization

As described in section 4.4, Funke & Reips (2006) found out that linear categorization was notthe best suitable strategy for transforming fine grained (in this case the slider-VAS ) to coarsegrained scales, but a categorization with reduced extremes2. These findings cannot be directlysupported by the information gained from the experiments conducted. A problem with findingmeaningful categorizations came from the overrepresentation of the two extreme points 1 and200.

The following transformations were conducted with the results of the slider-VAS :

• linear: linear transformation (which means 10 categories of size 20)

• reduced1: reduced extremes with value 16 at each end, four 20-categories at the outerareas and four 22 scale points for the middle area.

• reduced2: reduced extremes with value 12 at each end and for the rest 22 scale points.

To check which transformation best suited at the extreme points, the following test was made:for all variables of a questionnaire, the portions of the extreme values after applying the differenttransformation strategies were retrieved. These values were compared to a control which initiallyused a 10 point scale3. Thus for all transformations, comparisons were made with the radio

2Which means that at both extremes, less scale points from the more fine grained scale are added to the extremesof the coarse grained scale

3Because the most established input control for such scales is the radio control, so this was taken for comparison

130


control for all variables (except the 3 evaluation questions at the end4). The comparison wassimply performed with a χ2 test, whereby only the lower and upper extreme points were takeninto consideration5.

survey trans count sig (lower) hsig (lower) sig (upper) hsig (upper)tourism linear 68 2 0 1 0tourism reduced1 68 3 1 3 0tourism reduced2 68 3 2 4 2webpage linear 28 1 0 1 1webpage reduced1 28 3 0 0 4webpage reduced2 28 5 0 0 5

Table 20.2: Number of significant differences when using different categorization strategies ofthe slider-VAS control

After removing the evaluation questions, for the tourism survey, 68 comparisons and 28 for thewebpage survey were left for evaluation. Table 20.2 indicates that the more reduced the extremecategories were, the higher significant differences between the categorized slider-VAS and theradio control resulted for both extremes. Sig here means significance level of p<0.05 and hsig asignificance level of p<0.01. The same is valid for the webpage survey. As a conclusion, for thesetwo surveys, the theory of reduced extremes as observed within studies accomplished by Funke& Reips (2006) is not applicable for the webpage and tourism surveys. No significant differencesat all regarding the extreme points were found for the snowboard survey.

Although linear categorization is most suitable for the two surveys, there are still significantdifferences in regard to the radio input control. To find out which exactly are the most suitablecategory sizes for both extreme categories, an attempt was made to calculate the best categorysizes based on the distributions of all 10 item scale controls. This approach possibly had thedrawback of being at risk of getting results which were only suitable for the respective surveys.Figure 20.2 should explain the approach of calculating the best extreme category sizes.

4For details, see chapter 165Which results in a 2x2 table with selected and not selected as columns.

131


Figure 20.2: Compare calculated categorization with linear categorization - tourism survey

Here, the black line is the distribution function of the slider-VAS control aggregated for alltourism questions. On the y-axis, the portions of the 10 scale categories are drawn. To achievethe same portions for the slider-VAS, the intersection with the distribution function is takenwhich is vertically transformed to the x-axis. Here, the cut-points can then be read and takenas ideal category sizes. To compare with the linear transformation (20 scale items each), greenlines are drawn.

For this particular survey, it was observed that the lower extreme category was bigger than 20and the upper extreme category was almost 20. In the webpage survey, both extreme categorieswere size 26 and were therefore in both cases bigger than if a linear transformation had beenapplied. Category 5 was bigger than category 6 in both surveys. In the snowboard -survey, thelower extreme category was size 20, the upper one had size 296. Additionally in all 3 surveys,

6But again, this should considered in regard to small number of respondents

132


the left middle categories (5) were bigger than the right ones (6), which fits the results reportedin previous chapters.

Figure 20.3: Compare linear categorization points of simple-VAS with 10-scale controls - tourismsurvey

Figure 20.3 gives a similar view of the outcomes of the same questionnaire. Here again thedistribution function of the slider-VAS is presented in green. In red, the portions of the 10-scalepoint controls are given. The top-x-axis contains the differences between 10-scale controls andslider-VAS control. This figure should only give a different view of the same task visualized infigure 20.2.

133


Figure 20.4: Compare Boxplots for all cut-points - tourism survey

Figure 20.4 shows the distributions of the deviance from linear categorization for all cut-pointsof the tourism survey. A value greater than zero indicates that the cut-point for the slider-VASwas higher than if linear categorization was used (which means multiples of 20), which suggeststhat the cut-point was nearer to the right extreme. It can be seen that there are a lot of outlierslocated far away from the median. Because of this, it is hard to find ideal cut-points.


On examination of the slider-VAS distribution, it can be seen that categories 1 and 200 areoverrepresented. One possible conclusion for this phenomenon is: the number of scale items forthis input control is too high, because no fine nuances when answering at the two extremes isgiven. Respondents just wanted to express that they agreed or disagreed.

134


In the click-VAS distribution, categories 10 and 11 were higher than their direct neighbours,which shows a tendency towards the (not existent) midpoint. These results should show thatoffering a midpoint category is recommended. A similar, but not so strong effect concerning theextreme point as described above for the slider-VAS can also be found for the click-VAS.

Several categorization strategies were evaluated for slider-VAS and it seems as if linear reductionis the most suitable one. The attempt to find better cut-points when using linear categorizationdid not lead to satisfying results because of differences and contradictions between the questionsin all three surveys.

135

Part IV

Software Implemented

136

21 Introduction

The software itself (the conceptual and technical work) is an essential part of this dissertation.For this reason, a short description of its functionality and technical background is given. Thesoftware is published as open source, the intention behind it is to create a community of devel-opers who help to stabilize and improve this tool and apply it to their own needs in such a way,that the community can also benefit from these additional implementations. These descriptionsshould also serve as a short introduction to how the software can be extended1. The accomplishedexperiments as described in part III were run with an early version of the current software. For anexample of the current version and additional information, visit http://www.survey4all.org.

21.1 Conventions

It was attempted to abide by a few general conventions in regard to the software:

• Only open source tools were used for development, running, administering, documentingand all other tasks which have to do with the software itself.

• The software was published as open source with all its components and subprojects.

• The software development process should be driven by the end users, which means, that theneeds of the end users (those who are using this tool to run surveys) should be considered.This is a good test of the extensibility of the software architecture and is an interestingexperiment in which direction the development will go.

• But nevertheless QSYS will always focus on being an online survey tool, there is no inten-tion to integrate e-learning capabilities or the possibility of using data analysis. Of courseit is desirable to create such functionality, but the focus should be kept on separatingthis functionality in external projects, where QSYS serves as a core system offering basicfunctionality via Web services.

The software is already published as open source (GNU General Public License (GPL)2 orsee St.Laurent (2004) for basic principles of open source software licensing in general and adetailed explanation of GPL on pages 35-48), visit http://www.survey4all.org or http://qsys.sourceforge.net for further information3.

1E.g. add question types and access modes for the respondents or modify the styling of the Web page2See: http://www.gnu.org/licenses/3Holtgrewe & Brand (2007) provide a sociological approach to open source strongly related to the shift in laborin general

137



http://qsys.sourceforge.net

http://qsys.sourceforge.net

http://www.gnu.org/licenses/

21 Introduction

21.2 General Software Description

QSYS is a Web based survey software intentionally developed for running experiments dealingwith visual design effects to respondent’s behavior. In the current version, the software includesan easy to use online-editor supporting the online survey’s creation, accomplishment and ad-ministration. The users can develop an online-questionnaire without any previous knowledge ofHTML.

As online survey software constitutes a competitive market, one has to ask, why do we needanother product in this field? Conversely to other commercial and non-commercial products,QSYS gives users and developers access to the source code and the permission to create deriva-tive works from the original survey software. This option allows the adaptation and extension ofthe software to customers’ needs and the installation of the complete package without any licensefees. Additionally, there is no available software which provides good support for visual designexperiments in Web surveys. QSYS attempts to bridge this gap. Of course it would have beenmuch easier to use an existing package (preferentially an open source package) and customizeit according to the special needs for the experiments. But all the existing packages have onemajor drawback: separation between content, style and design of the questionnaire respectivelyis not strictly implemented. The conclusion would have been to copy all questionnaires as oftenas different stylings are needed and assign the styles to them, which is not an optimal approach.The second motivation was simply to write a well structured and extendable open source surveydevelopment tool, publish it as open source, and see what happens.

The development process focused on the design of reliable and extendable software architecture.Therefore, new question types, extensions of existing question types, style and appearance ofquestions can be created and adapted quite easily without the necessity of breaking with exist-ing software design concepts. Moreover, questions’ visualization is strictly separated from thequestionnaire’s content, so each survey can be presented in an individual style with low demandon resources. QSYS supports all common question types together with a few innovative ones(like image map questions), which are customizable as well, and a diverse range of participa-tion modes. Additionally, features like PDF and XML export are implemented. The elaboratesoftware architecture allows a rapid extension, customization and embedment into a proprietaryinfrastructure.

21.3 Overview of Features

This section provides an overview of all the QSYS features:

• Several different question types are supported with various configuration possibilities.

• For all questions and alternative texts, a WYSIWYG-editor is offered to the questionnairedesigner to easily assign the desired styling, so changing font types, adding links, tables,images and all other controls supported by HTML can be added without the necessity ofany Web technology knowledge4.

4See http://tinymce.moxiecode.com/ for further information.

138

http://tinymce.moxiecode.com/

21 Introduction

• Questions can be displayed in a paging (one page per question) or in a scrolling (multiplequestions on one page) mode. When applying the scrolling mode, page separators supportthe question’s categorization as logical units.

• Branching on behalf of answers to closedended questions is possible.

• Questions can be marked as mandatory to give feedback to the respondent, when thequestion is not answered sufficiently.

• PDF export creates an offline version of the questionnaire (e.g. for mixed mode studies).

• XML export serves as an exchange and archiving possibility (answers of respondents canalso be exported as XML). When a survey creator is familiar with the XML schema used forquestionnaires, survey creation or e.g. repeating modifications can be done very efficientlyby directly editing the XML-document. It is even possible to write a custom editor tailoredto one’s needs.

• Paradata tracking is implemented e.g. for exporting the time needed for filling out onequestion together with browser and operating system information. In addition the exportof the IP address for each participant can be enabled. Storage can be turned off or masked5

to assure privacy, because this feature has to be used with caution: “Organizational re-searchers need to be sensitive to the confidentiality and security issues associated with theuse of Internet survey tools, as participants may be unwilling to provide candid responsesto a survey if their anonymity is not ensured” (Truell (2003, p.35)).

• Language independency is secured, as language tokens are stored externally. As a result,a simple translation of these tokens adapts the software to a further language.

• An optional summary of all answers for one respondent can be offered at the end.

• The questionnaire’s completion can be limited to a certain period.

• Advanced interviewee restriction modes (all can participate, common PIN-code, user/pass-word list, members of certain LDAP6-groups,...).

• Entire software is published under an Open Source License allowing extension and cus-tomization to distinct needs.

• A common Servlet engine is sufficient to install QSYS on a server. Not even a database isneeded (but it is possible to use one).

• A status (DEVELOPMENT, PRETEST, OPERATIONAL, DISABLED) can support theuser to distinguish between different phases of the survey stored together with the givenanswers.

• Data can be exported as CSV (which can be imported into all statistical analysis toolsincluding Excel), SPSS (currently sps-files are generated) and native XML.

All these tasks are described in more detail in section 21.5.

21.4 Supported Question Types

QSYS supports a wide range of question types (all allow several variations):5To export only parts necessary to identify e.g. an institution rather than an individual person, which wouldfor example look something like this: 138.232.xxx.xxx

6Lightweight Directory Access Protocol

139

21 Introduction

• Closedended questions.

• Dichotomous questions: similar to closedended questions. Here, the respondent can selectone of two alternatives.

• Pictogram questions: similar to closedended questions. Here with pictures instead oftextual alternatives.

• Closedended ranking questions: alternatives have to be brought in the right order.

• Question matrix: multiple questions with the same selection of alternatives.

• Question matrix with column grouping: integrating multiple matrices with the same subquestions in one view.

• Semantic differential or VAS: Visual Analogue Scale (VAS) constitutes a measurementinstrument measuring a characteristic or attitude believed to range across a continuum ofvalues. VAS is verbally anchored on each end, e.g. very good vs. very bad (support for acouple of sub questions and the anchor points can be set for each subquestion individually.

• Interval question matrix: just like a semantic differential, except for this type, labeling ofanchor points is the same for all subquestions.

• Interval questions: similar to interval question matrix, but without sub questions (themain question itself is rated)

• Openended questions: for this type, the size of input field can be varied; also number anddate fields are supported.

• Openended ranking questions: presenting multiple openended input fields for one questionto the user.

• Openended question matrix: presenting an openended input field for each sub question ofthe matrix to the user.

• Image map questions: respondents can select a certain region of an image map (e.g. ageographical map).

• Text blocks and page separators: these are workflow elements, not questions, for thepurpose of separating parts of the questionnaire and to add HTML code between questions.

21.5 Editor

Generally, the CMS area of which the editor is part of, is split up into groups. Each group hasa group administrator with a password used for logging in. One group can contain multiplequestionnaires, which are all listed and ready for editing on the group administrators overviewpage.

The editor for creating and customizing the survey is easy to use and featured with AJAXtechnology. This has the positive effect that pressing a submit button and waiting for thebrowser to reload the whole page becomes unnecessary. A simple click on e.g. a checkbox orbutton is sufficient to send edited data to the server for storage (although feedback is given if theaction was performed successfully or if an error occurred). This drastically speeds up enteringand editing questionnaire content.

140

21 Introduction

Figure 21.1: A sample view on the questionnaire editor

In figure 21.1, the main editor window is shown: on the left, a navigation bar listing all ques-tions together with an abbreviation of the question type is given. This has been done to givean overview of the whole questionnaire and to enable easy navigation. A full view mode for thenavigation bar is offered, which means that this bar occupies the whole screen, so full questiontexts are visible. This should ease and speed up exchanging, copying, and removing one or morequestions. To add a new question, simply select the desired type and the position from the tabon the left, where the question should be added. In addition, page separators can also be addedthis way.

The main section of the editor view is split up into different tabs. The tab common questionsettings is the same for all question types and is used to enter the general question text as wellas a description text7. Furthermore, a text to be displayed after the question can be specified.As additional settings for each question, some attributes can be set, like mandatory status ofthe question and general successor of the question, which is used for branching.

All other tabs depend on the question type which is currently being edited. There is a tabspecifically for questionnaire settings, where question type specific editing can be done (e.g. forquestion batteries, column width and alignment can be set). All tabs to the right are alsoquestion-dependent. For example, when a question battery is edited, additional tabs like row

7Which should not be part of the question itself, but should give hints how the question should be filled out.This text is usually written under the question text itself, in smaller letters

141

21 Introduction

and column come up, where sub questions and alternatives for the battery can be added. Thelast tab gives a preview, which is also available for all question types. The currently editedquestion is displayed as it will be to the interviewee.

The following basic settings can be applied to the whole questionnaire:

• Short link: when the link to the questionnaire is published (in invitation letters, e-mailsor news groups), it is beneficial to have a link with only a few and short parameters (toavoid e.g. line break problems with some e-mail clients).

• Should the IP-address be stored: when anonymity plays a major role for a survey, IP-addresses can be made anonymous (which is the default setting), or masked.

• Number of questions per page: it is possible to distinguish between one question per page,all questions on one page and using separators (which can be added in the main editorview as described above).

• Should a progress bar be offered to the respondent or not.

• Should a summary of entered data be offered to the respondent after filling out.

• A logo can be uploaded which is displayed on each page of the questionnaire.

• A data range can be specified within which the questionnaire is set as active. If thecurrent date is outside this interval, an appropriate message is displayed when accessingthe questionnaire and participation is impeded.

21.6 Technical Background

In this chapter, an overview of the technical background and technologies used is provided. Inaddition the technical preconditions necessary to install and run the software are listed.

The application is based on Java Servlet technology. For the creation of the application JakartaStruts framework8 was applied as controller (in a slightly unconventional way). XML plays a ma-jor role on all layers of the software architecture. All questionnaires and additional informationare stored as native XML documents. To render these questionnaires as HTML or PDF, XSLTand XSL-FO9 is used, which results in a complete separation of content and view. The decisionwhich style sheet is used for which content file (which means e.g. for which questionnaire) isadministered in an external configuration file. It is even possible to assign a certain style sheet atrandom or based on certain conditions. In the case of the experiments, technical preconditionsfetched from the browser settings10 determined the necessary style sheet. An Oracle database,eXist11 or even the file system alone (which is currently the preferred option) can be used fordata storage.

It should also be mentioned that during the development of the whole project, a standaloneStruts XSLT framework was created, which meets the demands of both, struts and XML/XSLT

8http://struts.apache.org9Extensible Stylesheet Language - Formatting Objects

10E.g. if Javascript was enabled11See: http://exist.sourceforge.net

142

http://struts.apache.org

http://exist.sourceforge.net

21 Introduction

(with language independency and other useful features). Some features of the framework werespecially created for the needs of the whole project, such as selecting an XSLT style sheet forone view by chance.

21.6.1 Technical Preconditions

For installing and running the software, only a Servlet engine like e.g. Tomcat12 (all testshave been run on this engine) or Jetty13 is necessary. Data is stored (at least in the defaultconfiguration) directly within the file system, which means not even a database system has tobe installed or connected to. Because of these few technical necessities, it is even possible to runthe software on the local machine to prepare a questionnaire offline and upload it later to theproductive system or use QSYS for small surveys (like evaluation sheets in the field of education)directly on your local machine or laptop (which would work as a small server in this case).

12http://exist.sourceforge.net13http://www.mortbay.org/jetty

143


http://www.mortbay.org/jetty

22 Software Architecture

In this chapter, a brief overview of the software architecture of the whole system and its com-ponents is given.

The software is split up into subprojects to ensure reusability of the particular components inother projects. Figure 22.1 shows a component model of the whole QSYS-system. All parts aredescribed in the following sections:

• QSYS-core: the core functionality, like creating, storing and managing questionnaires ofthe whole system.

• QSYS-web: the Web interface for the online version of QSYS with all XSLT style sheetsand mapping information.

• QSYS-tools: console based tools for automated processing are implemented accessingQSYS-core.

• struXSLT : a standalone XSLT-extension for the Struts framework. Amongst other projects,QSYS-web is based on this framework.

• q-utils: a simple utility classes used by all components.

QSYS

QSYS-core

QSYS-web

QSYS-tools

q-utils

StruXSLT

Figure 22.1: Component model of the whole QSYS-system

22.1 QSYS-core

This is the core component of the QSYS system. It was separated from the Web frontend tobe open to a possible branch which would lead to a standalone (and not Web-) application forcreating questionnaires. Currently, only a Web version for creating the surveys is available.

One of the main tasks this component is responsible for is the storage of questionnaires. Aquestionnaire consists of a list of questionnaire items. Each questionnaire item has two rep-resentations, one as object and the other one as XML fragment. Mapping between these twomanifestations is carried out within each class, which means each question type has its own

144


XML-(de-)serialization methods. There are tools which could do this in an automated way (likee.g. JAXB1), but most of them have drawbacks which in some cases influence software archi-tecture (e.g. concerning visibility of member-variables). Due to a of strong similarity of thequestions, inheritance is heavily used. For one simple question type, an UML2 class diagram isshown in figure 22.2 to demonstrate the hierarchy and organization of the questionnaire itemclasses:

22.1.1 Questionnaire Items

Subsequently a short overview of the class hierarchy of questionnaire items3 is given. For a graph-ical representation see figure 22.2, which provides examples of the concrete question classes: aninterval question (giving a rating on a scale between two anchor points) and a simple openendedquestion. For the concrete questions, the isDataComplete method checks if the question wascompletely filled out by the respondent (this method is called by canDataBeStored). All otherquestion types are organized in the same manner. For a complete list of supported questiontypes, see section 21.4. For all questions that hold sub questions, an interface exists which man-ages all the tasks necessary for holding multiple questions.

In both, the textual description and UML class diagram only the core concepts are illustrated,for a more detailed view refer to the source code which can be browsed and downloaded athttp://www.survey4all.org.

1http://jaxb.dev.java.net2Unified Modeling Language3A questionnaire item is the base class for questions and workflow elements

145



org.qsys.quest.model.question

«abstract»

QQuestionnaireItemid : intdispId : inttoXML(Document doc) : ElementgetItemType() : StringfindSuccessorQuestion(QAnswer answer) : intclone() : Objectequals() : boolean

QTextBlockxHtmlText : Stringclone() : Objectequals() : booleantoXML(Document doc) : Element

«abstract»

QBaseQuestionqText : StringafterText: Stringsucc : inttoXML(Document doc) : ElementfindSuccessorQuestion(QAnswer answer) : intclone() : Objectequals() : boolean

QQuestionobligatory : booleanexplanationText : Stringstyle : StringgetProgressValue() : intgetVariables() : Vector<QVariable>findSuccessorQuestion(QAnswer answer) : intclone() : Objectequals() : booleantoXML(Document doc) : ElementcanDataBeStored(QAnswer answer) : boolean

QOpenendedQuestionnoLines : intclone() : Objectequals() : booleantoXML(Document doc) : ElementisDataComplete(QAnswer answer) : boolean

QPageSeparator

clone() : Object

QSubQuestion

clone() : Objectequals() : booleantoXML(Document doc) : Element

QIntervalQuestionmin : intmax : intstep : intmin_label : Stringmax_label : Stringclone() : Objectequals() : booleantoXML(Document doc) : ElementisDataComplete(QAnswer answer) : boolean

Figure 22.2: Question classes diagram showcase

A QQuestionnaireItem-class has two member variables, the id, which is unique for the wholequestionnaire and used for referencing the questions, and dispId, which is the id to be displayedon the questionnaire. The item-type of each item is set via annotations4. It is possible to get alist of all concrete questions (or questionnaire items respectively) with their class and the type.This mechanism is also used to generate an instance of a class only by knowing the type (such

4Annotations provide data about a program that is not part of the program itself. They have no direct effecton the operation of the code they annotate. (taken from the documentation-pages of http://java.sun.com)

146

http://java.sun.com


a type could be e.g. questionMatrix, the corresponding annotation would be @QuestionA(name= "questionMatrix", order = 5), where order is the position of the question when presenting allquestions within a list. Here another example of an annotation for the page separator: @Ques-tionA(name = "pageSeparator", order=2, isQuestion=false)). Class QFactory is responsible forretrieving concrete questionnaire item instances by question type. To achieve this, the reflection-technique is heavily used. Suitable constructors are invoked dynamically.

For all question types, methods of class Object (like clone5, equals6) are overridden. Each ques-tion has a default successor question, which is used for branching and will be described in alater section. All of these implement the Cloneable interface, and all subclasses implement atoXML method used for serialization to XML, and a constructor taking an XML element usedfor generating a concrete questionnaire item based on the content of this element. In additioneach questionnaire item knows best how to find out the successor question according to theanswer given7.

On each position of the questionnaire text blocks can be integrated which apply all capabilitiesof HTML. These blocks are also created with a WYSIWYG-editor within the Web presentationlayer. A page separator has the same functionality, but if manual page separation is enabled,these elements act as page separators and are displayed as the first entry of a section containingmultiple questions. The items have to be strictly separated from questions, because they haveno corresponding answers, which affects different central parts of the software.

All question types inherit QBaseQuestion. Each question must contain a question text (qText),a text after the question (afterText) and a default successor question (succ). Again, all textsstored can contain HTML. When a question consists of multiple sub questions, these classescontain a list of QSubQuestion.

All standalone questions (which means all but sub questions) are inherited from QQuestion.Therein the information is stored whether a question is obligatory and an additional explana-tion text can be set, which is usually displayed under the question text itself and should guidethe respondent through the filling out process, as well as whether a style attribute can be setper question. This should give the presentation layer a directive how this question should bedisplayed. An example of different styles would be the display of radio buttons instead of atext input field for an interval question. The method getProgressValue calculates some kind ofexpected duration weights for one question. These values e.g. depend on the number of subquestions and the question type itself, but these strategies can be extended according to theown needs. The method canDataBeStored determines if the answer given by the respondent issufficient when the question is marked as mandatory. The method getVariables returns a list ofvariables which are generated for one question, which is primarily used for data exporting.

The whole questionnaire is held as an instance of QQuestionnaireObj, whereby a list of QQues-tionnaireItem objects are stored together with a QHeader instance, which holds administrative

5To create deep copies of question objects6For comparing questions, mainly used within the test classes7For a description of the Answer -class hierarchy see section 22.1.2

147


data regarding the questionnaire itself (e.g begin- and end-date, status, creator, creation date).All modifications on such a questionnaire object are done by QuestionManager. This layer isnecessary to retrieve and store the modifications carried out on the questionnaire object on thepreferred medium8.

22.1.2 Answers

One strategic consideration was to strictly separate the storage of the questionnaire and theanswers given, which has the consequence of a class hierarchy for answer classes parallel to thequestionnaire items hierarchy. To weave question and answer objects together (which means todetermine which answer class is responsible for storing data for which question type), once againannotations are used. For example, the annotation @QuestionsA(names={"questionMatrix","questionMatrixMult"}) when placed for QClosedMatrixAnswer would mean that this answertype is responsible for processing answers for question types questionMatrix and questionMa-trixMult. There are less answer classes than question classes, because some questions generatethe same answer data, and so, some answer classes can be used twice.

In figure 22.3, as for the questionnaire-items, an UML class-diagram together with a shortdescription of the classes and their methods, is shown. Again, only a few concrete answerclasses are added to the diagram just to show the basic class structure. All classes are inheriteddirectly from QAnswer and simply contain the answered values, which were fetched out fromQAnswerDictionary. For example, in the case of the openended answer class, this is a simplestring, in case of the closed answer, a list of possible selected alternatives and reason values9

and in the case of the interval questions a position on the scale is given.

org.qsys.quest.model.answer

«abstract»

QAnswerid : intdateEntered : Dateduration : inttoXML(Document doc) : ElementaddVarValues(Vector<QVariable>, Vector<String>) : void

QOpenendedAnswertextAnswer : StringtoXML(Document doc) : ElementaddVarValues(Vector<QVariable>, Vector<String>) : void

QClosedAnsweralternatives : TreeMap<Integer, String>reasonVals : TreeMap<Integer, String>toXML(Document doc) : ElementaddVarValues(Vector<QVariable>, Vector<String>) : void

QIntervalAnswerintervalVal : inttoXML(Document doc) : ElementaddVarValues(Vector<QVariable>, Vector<String>) : void

Figure 22.3: Answer class diagram showcase

Similar to the for questionnaire items, the toXML-method exists together with a constructorincluding an element used for XML-serialization. Additionally, a constructor exists for eachconcrete QAnswer -object with QAnswerDictionary as the parameter for stuffing the answer ob-jects with content. This class inherits from Hashtable<String, String> and is used to transfer

8For for further information see section 22.1.49Which mean additional texts which can be entered next to a selected alternative

148


the answers from the frontend to the model in a generic way. Each answer class then knowsbest how to interpret the content of this Hashtable and reads the appropriate information. Inaddition a method addVarValues(Vector<QVariable> var, Vector<String> vals) exists, throughwhich concrete variable values are generated and used for exporting. All values are appended tovariable vals (call-by reference) according to the variable settings given in var. The list of vari-ables is the only communication point which should show the concrete answer how and whichvariable values should be exported. Additionally, for each answer, the duration is stored inmilliseconds. In the current Web implementation, this duration is tracked on client side, whichmeans, measurement starts when the page is completely loaded, and ends, when the submitbutton is pressed.

Class QAnswersObj holds all answers given for a questionnaire together with a header, whichholds some metadata about the filling out process, such as: the time when filling out started,an unique interviewee id, a flag whether the respondent finished filling out or not, which stylewas assigned10 and paradata11 collected amongst others directly within the client’s browser.

22.1.3 Paradata Tracking

Online-surveys give the additional opportunity to track information about the filling out processand the instrument used by the respondent. Subsequently a list of paradata tracked by thesoftware and what can possibly be implied from this information is provided:

• Session identifier: identifies answers filled out immediately one after the other on the samecomputer.

• Screen resolution (width and height)

• User agent (the Web browser used)

• Operating system

• Cookies enabled

• Java enabled

• Javascript enabled: this and previous paradata within the list can be used to possiblyidentify side effects or technical problems e.g. certain browsers have special configurationsset (e.g. when Javascript is disabled or low screen resolution).

• First referrer: in the case of recruitment from different Web pages, it can be determinedfrom which Web sites the respondents came.

• Remote address (the IP-address of the respondent). Storing the IP address unmasked isturned off by default because of possible problems concerning the anonymity of the re-spondent. If those who run the survey are aware of these problems and turn IP-trackingon, additional possibilities come up: in some cases it is possible to determine which ques-tionnaires were filled out on the same machine. This statement is qualified, because insome cases (in general within big institutions) different computers appear with the sameIP-address on the internet, which is in most cases the IP-address of the proxy-server ofthe institution. But in this case at least the institution can be found. In addition some

10This was essential for the experiments11See section 22.1.3

149


providers use dynamic IP-addresses, which means that when the respondent’s PC is con-nected to the internet sequentially and multiple times, always a different IP-address isassigned and there is no chance to identify PC’s individually. In some cases it is possiblee.g. if the respondent filled out the questionnaire at home or at work, if this informationis of any interest for the survey.

Technically, tracking is directly done within the client’s browser via Javascript. An AJAX calldelivers all the parameters to the server whilst the Web page is loading in the background ofthe respondent’s browser. The connection to the answers given by the respondent is done viasession-id. If Javascript is turned off, most of these parameters cannot be accessed. Someparameters like operating system, user agent, session-id and first referrer can also be fetchedfrom the HTTP header on the server side.

22.1.4 Storage

Basically, three types of storage are currently supported; information is held...

1. ...directly within the file system as native XML-documents.

2. ...within an oracle database, where XML-documents are stored as XMLType. A free ver-sion of an oracle database (Express Edition12) is available for free use, which fulfils therequirements of QSYS.

3. ...within the open source native XML-database eXist13.

To make QSYS work with one of these options, setting the entry DB within qsys.properties(which is located in the QSYS-project’s WEB-INF/classes) is sufficient. Possible entries are:FILESYSTEM, ORACLE, EXIST.

The first option is preferred for several reasons: no preconditions14 concerning storage have to befulfilled. It is also the most frequently used version for all currently installed versions of QSYS,so it’s the most stable alternative. In addition some features are currently not implementedfor the other two alternatives. This mode is also very fast and does not need much space onthe server. Furthermore, backup copies of simple XML-files can easily be made. If any urgentmodifications have to be done, simple direct changes on the files are sufficient. For these reasons,this storage mode is described here in more detail, the other 2 alternatives work very similar.

22.1.4.1 File Organization

To set the root folder for XML-storage, one should alter the file WEB-INF/classes/fsdb.propertieswithin the QSYS-project and set the parameter PATH to the desired storage location. The filesare hierarchically organized as follows: within folder conf, two files are located: groups.xml,which contains basic general and status information for each group and shortlinks.xml whichcontains a mapping of group id and questionnaire id to a tiny link to shorten the link to aquestionnaire. All other information is stored in group folders: Each group has a folder15 with a

12http://www.oracle.com/technology/products/database/xe13http://exist.sourceforge.net14Which means in this case no installation or access to a running database is necessary15The id of the group is the folder name

150

http://www.oracle.com/technology/products/database/xe



document questionnaires.xml which contains basic overview information on each questionnaireand a subfolder for each questionnaire. The organization of files within a group can be seen inlisting 22.1. In this example, all terms within angle brackets should symbolize a variable for anystring. The following files are directly located within the group’s root directory:

• basicSettings.xml holds basic settings on how questioning should be conducted (e.g howmany questions should be displayed on one page, should the IP-address be stored, shoulda summary be displayed at the end and should a progress bar be displayed).

• interviewees.xml stores the mode of interviewee recruitment and basic settings for theselected mode.

• questionnaire.xml is the questionnaire itself.

The subfolder answer contains all answers given by the respondents, one document for each.The name of the file consists of the unique interviewee id together with a timestamp whenthe interview was made to ensure uniqueness. When taking a look at the questionnaire duringcreation, the results of the test runs are stored in groupadmin.xml in order be able to excludethis document from further analysis.� �

1 $ tree <group_1 >2 <group_1 >3 |-- questionnaires4 | |-- <quest_1 >5 | | |-- answer6 | | | |-- ADUDOIIO_1197533982151.xml7 | | | |-- AIQTONZY_1196676852825.xml8 | | | |-- ALFQWJMO_1196595314843.xml9 | | | |-- AVJRVCGO_1196589478675.xml

10 | | | ‘-- groupadmin.xml11 | | |-- basicSettings.xml12 | | |-- interviewees.xml13 | | ‘-- questionnaire.xml14 | ‘-- <quest_2 >15 | |-- answer16 | | |-- TPQDDYSJ_1195763591505.xml17 | | ‘-- groupadmin.xml18 | |-- basicSettings.xml19 | |-- interviewees.xml20 | ‘-- questionnaire.xml21 ‘-- questionnaires.xml� �

Listing 22.1: A simplified example of a tree-view for files stored for one group in QSYS

22.1.4.2 XML Queries within the File System

Even if there is no database available and all answers are stored as simple XML documents,easy queries make it possible to get an impression of the current response using standard UNIXtools such as xmlstarlet16, uniq and sort. As an example, a simplified answer document is givenin listing 22.2 which is used to demonstrate the query examples in the following listings. All ex-amples run very quickly, at least not appreciably slower than if a database would have been used.

16Which even supports XPath 2.0 functions like min, max,...

151


� �1 <?xml version ="1.0" encoding ="ISO -8859 -1"? >2 <response >3 <header finished ="true">4 <tracker remoteaddress ="130.82.1.40" />5 <creDate >28/01/2008 03:36:59 PM </creDate >6 </header >7 <answer dateEntered ="28/01/2008 03:39:21 PM" duration ="135967"8 id="2" type=" QClosedMatrixAnswer">9 <subquestion id="1">

10 <alternative value ="1"/ >11 </subquestion >12 <subquestion id="2">13 <alternative value ="2"/ >14 </subquestion >15 </answer >16 <answer dateEntered ="28/01/2008 03:56:09 PM" duration ="33251"17 id="46" type=" QOpenendedAnswer">18 <answer >1962 </ answer >19 </answer >20 <answer dateEntered ="28/01/2008 03:56:09 PM" duration ="35142"21 id="48" type=" QClosedAnswer">22 <alternative value ="5"/ >23 </answer >24 </response >� �

Listing 22.2: A simplified example of an answer document

To perform queries on a folder containing such XML documents, the combination of simpleUNIX tools (as mentioned above) is sufficient. This should be illustrated with the followingcommands: to find out how many respondents have currently filled out a questionnaire (with adistribution of how many finished and how many dropped out), run the following command:� �

1 $ xmlstarlet sel -t -v "// header/@finished" * | sort | uniq -c2 156 false3 73 true� �

Listing 22.3: Command for respondent’s overview of one questionnaire

Here (and also in the following script example) an XPath query is applied to all documentsin the answer folder. All resulting values are sorted and counted by the uniq-command. Theresult of the example shows that 73 completed the questionnaire and 156 dropped out. Anotherexample script which provides a distribution of the last filled out id’s (NaN means not a singlequestion was filled out):� �

1 $ xmlstarlet sel -t -v "math:max(// answer/@id)" * | sort -n | uniq -c2 93 NaN3 5 24 31 125 1 136 2 477 1 498 71 50� �

Listing 22.4: Command for finding out the distribution of the last filled out question

The same can be done with concrete results of single questions, e.g. for openended questions17:� �1 $ xmlstarlet sel -t -v "// answer[@id =46]/ answer/text ()" * | sort | uniq -c2 14 1968

17See question with id=46 in listing 22.2

152


3 15 19694 11 19705 5 19716 1 1972� �

Listing 22.5: Command for querying the results of an openended question

The same procedure is valid for closedended questions:� �1 $ xmlstarlet sel -t -v "// answer[@id =48]/ alternative/@value" * | sort | uniq -c2 57 13 1 24 6 45 5 5� �

Listing 22.6: Command for querying the results of a closedended question

These are just small examples. The possibilities these tools present are enormous. If these scriptswere combined and more logic and existing information integrated (like the question texts) awhole reporting system could be implemented quite easily.

Naturally also some of the drawbacks of storing native XML documents in the file system com-pared to using a database have to be mentioned:

• Data integrity cannot be assured as it could be done within a database when using con-straints.

• Central storage and access control are not as easily possible.

22.1.5 Exporting

When implementing the exporting package, it was attempted to generate a common frameworkfor writing tabular data to file, because this task is, regardless which content is written, alwaysthe same. Furthermore, introducing a new export format should be possible without much ef-fort. So the strategy was to build a clear class hierarchy, whereby methods in the base classes doall common tasks and assigning different content should be achieveable within the sub classes.BaseExporter and BaseDataExporter implement already common tasks for writing (or abstractmethods are defined to leave concrete implementations open) a header line and multiple contentlines together with the whole exporting process. Concrete implementations are then locatedwithin the two sub classes. It is sufficient to request a concrete data-exporter (either CVS orSPSS ) from class ExporterFactory to get the desired output format.

Currently CSV 18 is used as an exporting format. As a second format, SPSS is supported, butcurrently not fully implemented. Because the format for SPSS data files (.sav) is relativelycomplex and only commercial libraries exist (like Java SPSS Writer from pmStation19), andthese are solutions which do not fit into the overall concept, data is written into an SPSS syntaxfile (which has the same effect: all necessary information can be set, and after running the syntax,data is written as a common SPSS data file). Exporting can also be done in a separate thread to18Character Separated Values, as field separator character, TAB is used19http://spss.pmstation.com

153

http://spss.pmstation.com


enable immediate return when requesting an export from the frontend. ParadataTimeExporterexports either client or server sided duration per question. An overview of the classes within theexport package is given in figure 22.4.

22.2 StruXSLT

Because the Web frontend of QSYS is based on this framework and its concepts, it is describedhere before QSYS-web is covered.

22.2.1 Basic Functionality

StruXSLT sits on top of Struts, extending its existing functionality to allow Action classes toreturn XML that will be transformed by technologies like XSLT and XSL-FO. One motivationto use StruXSLT is to remove the need to use JSP20 and Taglibs21 for the presentation layer ofthe Struts framework. However, StruXSLT does not necessarily force the XML way exclusively,both technologies will work side by side. The basic idea was taken from an article publishedat http://www.javaworld.com22, see figure 22.5 to get an idea of this concept. Here no JSPor Taglibs are used for visualizing data, but XSLT documents. Within the control layer, anXML document is created instead of storing multiple variables to the session scope. When usingXML/XSLT instead of the conventional Struts-approach, the separation between view and theother layers of the MVC -model is stricter. Furthermore, XSLT is standardized at the W3C 23

and is vendor and technology independent, so all style sheets generated can be reused in otherprojects, even in those who use a completely different technology (like e.g. .NET). These andother advantages are also mentioned in Karg & Krebs (2008).

Figure 22.5: MVC-2.x Model as taken from http://www.javaworld.com

20Java Server Pages21http://jakarta.apache.org/taglibs22Mercay & Bouzeid (2002)23http://www.w3.org/TR/xslt

154

http://www.javaworld.com


http://jakarta.apache.org/taglibs

http://www.w3.org/TR/xslt


org.qsys.quest.model.export

«abstract»BaseExporter

out : PrintWriterqDoc : DocumentsetTarget(String)closetTarget(String)export() : voidheader() : Vector<String>write(String) : Vector<String>writeLine(Vector<String>, boolean

ParadataTimeExporternoQuests : intanswerObjs : Vector<QAnswersObj>boolean clientheader() : Vector<String>writeContent() : void

«abstract»BaseDataExporter

answerObjs : Vector<QAnswersObj>variableSet : QVariableSetexport() : voidwriteLine(Vector<String>, booleanwriteVarDelimiter() : voidwriteLineDelimiter() : voidwriteValue(String val) : voidexportHeader() : voidheader() : Vector<String>writeContent() : voidcleanValue(String) : String

TABDataExporter

export() : voidwriteLine(Vector<String>, booleanwriteVarDelimiter() : voidwriteLineDelimiter() : voidwriteValue(String val) : voidexportHeader() : void

SPSSDataExporter

export() : voidwriteLine(Vector<String>, booleanwriteVarDelimiter() : voidwriteLineDelimiter() : voidwriteValue(String val) : voidexportHeader() : void

Figure 22.4: Export class diagram showcase

155


The framework described here is an essential part of QSYS and the concept behind it (eventhough it has become a standalone framework) and therefore it is necessary to give a briefdescription of this component here.

Several other solutions within the field of open source exist (like StrutsCX 24 and stxx 25), butnone of these packages implemented random or conditional assignment of style sheets whichwas necessary for the experiments of this thesis (how this can be done is described below). Forthis reason, this new framework was implemented, which is also employed in other projectsindependent of QSYS. StruXSLT is also published as open source and can be downloaded fromsourceforge26

The basic features of this framework are listed here:

• Language independency: all language tokens are set within external XML files, whichare woven together with the content of the page to be visualized via XSLT afterwards.The settings for linking language files to actions are described in section 22.2.4, where theXSLT mapper is described in detail.

• No JSP necessary: The main concept of the framework is to simply use XSLT forrendering content. Nevertheless usual Actions working with e.g. Struts Taglibs27 can beused.

• AJAX -support: data generation and forwarding can be influenced by an action param-eter delivered when actions in the view-layer are requested.

• Support for debugging: several functions e.g. showing the generated XML which isused for rendering directly within the browser were implemented.

• Predefined methods for preparing information for actions in the view-layer, e.g. addAd-ditionalMenueEntries, where e.g. menu entries, which should be offered on the Web pageare listed in the XML to be rendered.

• Feedback functions: from every position within the action classes, message and errorcodes can be set and transferred to the view-layer. Only error and message codes areset, the textual version is automatically read from the language files to assure languageindependency also for error and warning messages.

• Central configuration: one main configuration file exists which maps all Struts Ac-tions28, XSLT style sheets and language token XML files.

• Style sheets can be assigned by chance. The probability for selecting a certain style canbe set within the main configuration file.

• Style sheets can be assigned following certain conditions. A concrete example: it issometimes useful to assign different style sheets depending on whether Javascript is turnedon or off within the client’s browser. Another application would offer different styles fordifferent client types, e.g. to offer an iPhone version for a Web page.

24http://it.cappuccinonet.com/strutscx/doc/v08/en/intro/index.html25http://stxx.sourceforge.net26http://struxslt.sourceforge.net27http://struts.apache.org/1.x/struts-taglib/index.html28Which generate XML content in this case

156

http://it.cappuccinonet.com/strutscx/doc/v08/en/intro/index.html

http://stxx.sourceforge.net

http://struxslt.sourceforge.net

http://struts.apache.org/1.x/struts-taglib/index.html


org.webq.struxslt

«abstract»

BaseStruXSLAction

sess : BaseStruXSLSessionManagerpreCondition : void

«abstract»

BaseStruXSLViewAction

generateXML() : Document

«abstract»

BaseStruXSLAjaxAction

«abstract»

BaseStruXSLProcessAction

doProcessData() : String

Figure 22.6: Base Action classes of the StruXSLT -framework

• The way XSLT style sheets are selected can easily be customized and extended. To do so,simply implement a new XsltMapEntry class29.

• The framework also clearly differentiates between Action classes for processing data andactions for generating content for visualization, which improves the architecture and givesa better overview.

22.2.2 Usage

To use this package, a new base class has to be used instead of Struts’ Action class, but itis not necessary to exchange the ActionServlet, which would be the case with other packagessuch as stxx, so registering Struts within web.xml can be done the customary way. It is simplynecessary to copy q-struxsl.jar (which is the archive containing StruXSLT) to the application’sWEB-INF/lib folder and extend from the Action classes described below.

22.2.3 Action Classes

StruXSLT strictly separates view-Actions from process-Actions, so two main base classes existwithin the framework. This diagram should give an overview of the methods to be overriddenfrom the perspective of the framework user.

22.2.3.1 BaseStruXSLAction

This class serves as the base class for the Action classes within the whole framework. The mainpurpose of this common base class is to band together core functionality necessary for all con-crete Actions described below. For example, an instance of a session manager is stored as wellas the basic settings for the user’s session, which are common for all sub classes. In general,messages, errors and other information, which can be either used for communication betweenthe concrete Action classes and for the presentation layer (all set messages are also written tothe XML used for rendering), can be set from any Action. So basic managing (like adding errorsand messages) of all these messages is implemented here. All messages to be set are languageindependent, which means that simply an error or message code is set. This code is afterwards29See section 22.2.4 below

157


substituted by the definite message text taken from the XML language token files.

Furthermore, enabling a debug mode is implemented here, which is simply a flag that is alsowritten to the view-XML. This can be used to allow the output of additional helpful informationduring development. Additionally, the retrieval of language tokens and logging also resides here.This class works in the background and when using this framework the developer is confrontedwith the base classes described below.

22.2.3.2 BaseStruXSLViewAction

This class serves as the base class for all Actions generating (XML-) data to be used for ren-dering via XSLT or XSL-FO. All information specific for a certain Action is generated withinthe generateXML-method, so one must implement this abstract method in the project specificview class when overriding BaseStruXSLViewAction and the document is returned and can beused for rendering. Within this Action, XML can be transformed into the desired output format(e.g. HTML or PDF ) according to the stylesheet assigned is done. Style sheet assignment isdone within the mapper classes described below, but storing and managing preconditions (e.g.if Javascript is enabled) are handled here.

Language tokens are added to the XML document according to the desired language automati-cally. If debug modus is enabled, parameter out can be used to directly see the XML documentused for rendering when adding parameter out=xml. The same goes for xsl which shows theXSL-document used for rendering. To uniquely add a stylesheet to an application-dependentkey, which is used to assign the same stylesheet for the whole session to a certain identifier, themethod getXsltKey has to be overridden in the concrete sub classes.

22.2.3.3 BaseStruXSLProcessAction

This is the base class for all actions which should process any data (which in general meanswriting information to the database, file system or to the session). The abstract method do-ProcessData has to be overridden within the concrete subclass to do so. As a return value,we receive an ActionForward, where we would normally receive a forward to a concrete BaseS-truXSLViewAction. Within this method, error and warning messages can be generated (whichare stored within the session) and used by the view-layer to give feedback to the user as towhether data processing was successful or not.

Two parameters can be set to influence the given output:

• ajax: if this parameter is set to true, just errors and feedback messages are deliveredto the requesting page. These are used to give the user feedback after performing anAJAX -request.

• tfb: if this parameter is set to true, a simple text message is delivered to the requestingpage. This is used in the case of automated data processing or when certain functions arecalled from external systems, e.g. in case everything worked fine, the return value is justOK in mime type text/plain.

158


22.2.3.4 BaseStruXSLAjaxAction

This class is integrated for future use and should handle simple AJAX requests when no responseis delivered.

22.2.4 Mapper Classes

The central configuration file for the whole framework is xslt-map.xml, where the XML-generatingaction is woven together with the stylesheet and language token documents to ensure languageindependency. Place this document directly into your Web application’s WEB-INF folder. In-ternally, one XML document is generated per request, which is rendered by the XSLT -documentspecified.

The main settings within this document have the following structure:

� �1 <xsltEntries debug="false">2 <supportedLangs langs="EN , DE" />3 <messageCodes lang_file="global/msgs.xml"/>4 <global_langs >5 <global_lang lang_file="global/menue.xml"/>6 <global_lang lang_file="global/head.xml"/>7 </global_langs >8 </xsltEntries >� �

Listing 22.7: An example of a simple XSLT map header

Here with the attribute debug the debug mode can be turned on by default. All supportedlanguages can be listed within the second element. Mapping from message codes to the languagedependent messages itself is done within element messageCodes. Global language token fileswhich are needed by the view-layer can be specified within global_lang elements. These andthe message-code documents are automatically copied to the XML used for rendering. In thefollowing, the two currently implemented mapping modes are described. It is easy to write anadditional mapping.

simple A simple entry which weaves together an Action, a language token file and an XSLTfile can be defined as shown in the following example:

� �1 <entry path="/basicsettingsv" type="simple"2 lang_file="admin/basicsettings.xml">3 <xslt path="admin/basicsettings.xsl" />4 </entry >� �

Listing 22.8: An example of a simple XSLT map entry

Each simple entry-element consists of the following attributes and child-elements:

• path: specify the path of the Struts action which is responsible for generating XMLcontent used for rendering.

• lang_file give a comma separated list of language token files. These are dynamicallyselected according to the language currently set within the running system.

159


• also the mime type can be set30, which would make sense in the case of employingXSL-FO for PDF -generation would need an application/pdf mime type. No additionaltasks have to be done within the subclassed Actions. The correct transforming engine isautomatically invoked and dependant on the mimetype being HTML, XML, PDF or plaintext.

• xslt contains an attribute path, where the relative path to the XSLT -stylesheet is speci-fied.

Such an entry is necessary for all views of the Web application. These entries have to be placeddirectly as child elements of the root element xsltEntries.

random Another more complex mapping type called random can be configured in the followingway:

� �1 <entry path="/doqv" type="random" group="univ_uibk" questionnaire="webpage"2 lang_file="do/do_main.xml">3 <xslt path="/to/any.xsl" name="personal_1" prob="34"4 conditions="java , javascript"/>5 <xslt path="/to/any.xsl" name="personal_2" prob="33" conditions="javascript"/>6 <xslt path="/to/any.xsl" name="personal_3" prob="33"/>7 </entry >� �

Listing 22.9: An example of a random XSLT map entry

These settings have the following meanings (those who are equal to the simple-example are notexplained here):

• keys: for the attributes group and questionnaire, concrete groups and questionnaires canbe set and only if these two values are equal to those when requesting the map entry,this map entry will be selected. This allows the distinction between different groupsand questionnaires and assigns different style sheets to each without any necessary codemanipulation. Consequently, it is possible that two entries or more entries with the samepath exist in the configuration file. The one to be chosen for rendering is selected viathese two additional attributes. Of course, the terms group and questionnaire are justexamples31 and can be specified within the application. In fact, these two attributes couldalso be named key1 and key2.

• prob: define the likelihood for a certain style to be selected. The sum of all chances mustbe 100.

• conditions: some preconditions can be set for a style to be selected (e.g. if Javascriptand/or Java is enabled or not). Again, the naming of these conditions is not fixed and canbe freely chosen for each application. If a certain xslt-entry was selected, but conditionsare not satisfied, random selection is repeated until an XSLT-entry is selected which fulfilsall conditions. Therefore it is necessary to pay attention to these conditions, because thereis no way of avoiding infinite loops.

• name: the name of the selected style can be used for branching within an XSLT -document.In contrast to the example above, the path-attribute values of course can differ.

30Which is not shown in the example above because default response mimetype is set to text/html31It was used that way within QSYS when running surveys with different styling in parallel

160


22.2.5 Language Independence

All language token files have to be located within the following path (the root for all languagefiles is WEB-INF/lang):

<language_abbreviation>/<path_to_lang_file>, e.g. EN/admin/login.xml.

A normal language token file has the following structure: because these language documents aredirectly copied to the XML document used for rendering, hierarchies of elements or attributescan also be set and used within the style sheets. One simply adds all child-elements of theroot-element <lang>, e.g. within the QSYS project. Language key is the element name andlanguage value is stored within the text node.

Introducing a new language could easily be done when following these steps:

1. Add an abbreviation for the language in xslt-map.xml at supportedlangs/langs, e.g. IT forItalian.

2. Copy all language tokens from an existing language (e.g. EN) to the new language folderand translate the text nodes.

22.3 QSYS-Web

QSYS-web is the Web frontend for the QSYS system. It is based on the StruXSLT frameworkand uses the functionality of QSYS-core, so sections 22.1 and 22.2 must be read before attentionto this section can be given. For rendering XML content to other output formats, two open sourceprojects from the Apache group are used, namely Xalan32 for generating HTML and FOP33 forPDF output. Both work very well, style sheets simply need to be written and the desired outputis generated. The FOP developers plan to improve RTF 34, which would be used as an additionaloutput format for questionnaires with QSYS. The advantage of RTF over PDF is that it canbe modified after generation by the user using e.g. Open Office or Microsoft Word.

22.3.1 Class Hierarchy

The concept of separating between Actions responsible for generating content for the view-layeron the one hand and on the other hand Actions for processing requests as predetermined byStruXSLT will be retained and extended with another differentiation, namely between publicaccessible Actions and those only accessible by the group or system administrators. Because corefunctionality is already implemented within StruXSLT, it is sufficient to concentrate on project

32Xalan-Java is an XSLT processor for transforming XML documents into HTML, text, or other XML documenttypes. It implements XSL Transformations (XSLT ) Version 1.0 and XML Path Language (XPath) Version1.0 and can be used from the command line, in an Applet or a Servlet, or as a module in another program(http://xml.apache.org/xalan-j)

33Apache FOP (Formatting Objects Processor) is a print formatter driven by XSL formatting objects (XSL-FO)and an output independent formatter (http://xmlgraphics.apache.org/fop

34Rich Text Format

161

http://xml.apache.org/xalan-j

http://xmlgraphics.apache.org/fop


specific classes and methods without any distraction from basic and workflow functionality.

In the following the main structure of these Action classes should be shown exemplarily togetherwith some concrete derivations.

22.3.1.1 View

org.webq.struxslt.view

«abstract»

BaseStruXSLViewAction

generateXML(ActionMapping, [...] : DocumentgetXsltKey(HttpServletRequest,path)

org.qsys.quest.action.view

«abstract»

BaseQsysViewActionsessq : BaseQsysSessionManagersessq(HttpServletRequest) : BaseQsysSessionManageradditionalStatusXmlEntries(request) : Dictionary<String,String>preCondition(HttpServletRequest, ActionForm) : voiddoTrack(HttpServletRequest)getXsltKey(HttpServletRequest,path)

«abstract»

BaseQSysAdminViewAction

execute(ActionMapping, [...]) : ActionForward

«abstract»

BaseGroupAdminViewAction

preCondition(HttpServletRequest, ActionForm) : void

AnswerLoginViewAction

generateXML(ActionMapping, [...] : Document

AdminViewAction


BasicSettingsViewAction


Figure 22.7: QSYS -web view class diagram showcase

In figure 22.7, a sample showcase is described for the view-classes hierarchy within the QSYS -web component. BaseQsysViewAction serves as a base class for all view-actions used withinQSYS. Here getXsltKey generates a key consisting of the group-id, the questionnaire-id and theinterviewee-id to assure proper assigning of the style sheets. Within the overridden methodprecondition, necessary environment variables are set. doTrack is responsible for tracking userparadata (gathered either from the client or from the server side).

162


22.3.1.2 Process

org.qsys.quest.action.process

«abstract»

BaseQsysProcessActionsessq : BaseQsysSessionManagersessq(HttpServletRequest) : BaseQsysSessionManageradditionalStatusXmlEntries(request) : Dictionary<String,String>preCondition(HttpServletRequest, ActionForm) : void

«abstract»

BaseAdminProcessAction

execute(ActionMapping, [...]) : ActionForward

AnswerLoginProcessAction

doProcessData(ActionForm form, HttpServletRequest request) : void

«abstract»

BaseGroupAdminProcessAction

preCondition(HttpServletRequest, ActionForm) : voidaddQuestionnaire([...]) : void

AdminProcessAction

execute(ActionMapping, [...]) : ActionForwardBasicSettingsProcessAction

doProcessData(ActionForm form, HttpServletRequest request) : void

org.webq.struxslt.process

«abstract»

BaseStruXSLProcessAction

doProcessData() : String

Figure 22.8: QSYS -web Process class diagram showcase

Figure 22.8 shows a sample showcase of the process-classes hierarchy. BaseQsysProcessActionserves as a base class for all process-actions.

22.3.2 Configuration and Installation

22.3.2.1 Preconditions

To run QSYS on your server or local machine, a Servlet engine like Tomcat35 with Java Run-time Environment (version >= 1.5) is sufficient, which is pre-installed in most cases. Not evena database is necessary when running the file system storage-mode. Because of this, it is alsopossible to install QSYS locally (e.g. on the laptop) so QSYS can also be used offline. Ques-tionnaires can then easily be imported into the online-system. To install, simply download the.war36-document from the project’s homepage and deploy37.

22.3.2.2 Property Files

Two files have to be created and located on the class path:

qsys.properties, which stores general settings used within the whole system.� �1 DB=FILESYSTEM #mandatory2 MAIL_ADDRESS =[ mail_address] #optional3 SMTP_SERVER =[ full_path_to_smtp_server] #optional4 DATA_EXPORT_DIR =[ data_export_dir] #mandatory5 ADMIN_PW =[ admin_pw] #mandatory6 LDAP_URL =[ path_to_ldap] #optional

35The software was tested with version 636Web archive37Which means either copying the archive to tomcat’s webapps directory or deploying via tomcat manager

163


7 LDAP_GROUP_PREFIX =[ ldap_group_prefix] #optional� �Listing 22.10: Settings within qsys.properties

fsdb.properties, which defines the location where all information is stored when storing on thefile system is the selected storage method.� �

1 PATH=[ path_to_root_storage_dir] #mandatory� �Listing 22.11: Settings within fsdb.properties

22.3.2.3 Compile from Scratch

To compile the sources, currently a shell script (for Linux and Mac-users; qsys-web/build/build.sh)exists which completely builds all projects necessary to create the whole Web archive file. Thisscript calls ANT scripts and copies created libraries to the WEB-INF/lib directory. Each projectcontains a build-folder, where the appropriate ANT script (which is always named build.xml)is located. The resulting qsys.war file will be located according to the webapp-property as setwithin qsys-web/build/build.xml.

22.3.3 Additional Tools

Several tools were used to ease the development process, here only the most important ones aredescribed:

• ANT38 (which is an acronym for Another Neat Tool) is a Java-based build tool. In thecurrent version, many extensions are available enabling the use of ANT for a lot of othertasks (e.g. integrating JUnit-tests within the build process). In the QSYS -project, ANTis used for automatically building all subprojects, generating a Web archive for deployingon the Servlet engine, generating JavaDoc and running XDoclet-statements.

• XDoclet39: The generation of the struts-config.xml is done via XDoclet-tasks, which hasseveral advantages: it is error-prone to edit struts-config.xml directly, because for biggerprojects this file becomes very complex and hard to view as a whole. Additionally, it isbeneficial when the whole configuration for Struts is directly written next to the concernedclasses. See listing 22.12 for a sample of an XDoclet statement for a Struts Action (here forthe login process action). The instructions are directly placed over the class declarationintegrated into a JavaDoc comment. This information goes directly to the struts-config.xmlwhen running the corresponding ANT task.

In line 2 of the sample, an ActionForm-class and a Web path is assigned to the Action-class. In the subsequent lines, the action forwards are defined with a name and a pathindicating where the forward should go. If redirect is set to true, the URL within thenavigation bar of the browser changes to this new URL.

38http://ant.apache.org39XDoclet is an open source code generation engine. It enables Attribute-Oriented Programming for Java. In

short, this means that you can add more significance to your code by adding meta data (attributes) to yourJava sources. This is done in special JavaDoc tags. http://xdoclet.sourceforge.net/xdoclet

164

http://ant.apache.org

http://xdoclet.sourceforge.net/xdoclet


• Eclipse with Lomboz -plugins: As IDE40, Eclipse was used with certain plug-ins41. To runthe Tomcat Servlet engine within Eclipse, a special plugin was used42, which even enableddebugging of Servlets.

� �1 /**2 * @struts.action name="answerloginform" path="/answerlogin" scope="request"3 * @struts.action -forward name="success" path="/answerstartv.qsys" redirect="true"4 * @struts.action -forward name="toquestion" path="/doqv.qsys" redirect="true"5 * @struts.action -forward name="interviewee_exists" path="/intervieweeexists.qsys"6 redirect="true"7 * @struts.action -forward name="error" path="/answerloginv.qsys" redirect="true"8 */9 public class AnswerLoginProcessAction extends BaseQsysProcessAction { ... }� �

Listing 22.12: An example of XDoclet metadata attributes (for the login-process)

22.4 Utility Classes

The utility classes package is called q-utils. It simply contains some classes with handy functionswhich are in constant requisition. For example, classes are provided to ease XML processingas well as accessing databases efficiently, working with generics and reflection or connecting toan LDAP server. This should just serve as a toolkit for all Java projects to ensure reusabilityof common functionality. This has several benefits: The functionality grows with every newproject, and stability of these functions increases because they are heavily used, so errors orstrange behavior is revealed very early on. Unit tests are implemented (using JUnit43) for mostof these utility classes to protect against side effects with regression testing after modification.

Of course source code is also available for this package when downloading the actual QSYS orstruXSLT release, but making an extra release for these classes is not worth the effort. Thereare better solutions like e.g. Jakarta Commons44.

22.5 Additional Notes

22.5.1 Quality Control

To assure software quality is as good as possible, testing of the software became a major pointduring development. As tool for unit and regression testing, JUnit was used.

22.5.2 Software Metrics

Here just a few figures to describe the complexity of the generated software: The number ofclasses for QSYS-core is 152 (12.514 LOC), QSYS-web consists of 98 (4834 LOC + all XSLT-40Integrated Development Environment41The most essential ones are already bundled within the Lomboz distribution42taken from http://www.eclipsetotale.com/tomcatPlugin.html43http://www.junit.org44http://commons.apache.org

165

http://www.eclipsetotale.com/tomcatPlugin.html

http://www.junit.org

http://commons.apache.org


documents), StruXSLT of 20 (1581 LOC) and the utility package of 86 classes (5785 LOC)45

22.5.3 Schema RFC for Questionnaires (and Answer Documents)

A schema to describe XML documents which hold questionnaires and answers was implementedwithin the scope of the thesis and will be enrolled at the W3C in the form of a request forcomments (RFC). Standardization would be one big step forward to enable exchangeability ofquestionnaire definitions between surveys. Furthermore, archiving of questionnaires and answerdocuments would be simplified and generalized. The goal is not to create the standard but moreto put it up for discussion so possible further steps are set by other survey software developers.The schema definitions can be found at http://www.survey4all.org/qsys-xsd.

45Lines of code were determined with http://www.dwheeler.com/sloccount

166

http://www.survey4all.org/qsys-xsd

http://www.dwheeler.com/sloccount

23 Additional Tasks to be Implemented

23.0.4 Federated Identity Based Authentication and Authorization

A first step towards integrating QSYS into the infrastructure of companies and institutionsis given through the support of LDAP. This support, for example, enables the limitation ofrespondents for a certain questionnaire to members of one or more LDAP groups. Further stepsin this direction could be the integration of technologies like shibboleth1 and OpenId2 to benefite.g. from single-sign-on. Additionally, it is planned to offer most of the functionality of QSYSvia Web services.

23.0.5 R Reporting Server

Most commercial and open source online survey tools contain basic (or in some cases advanced)reporting functionality. The strategy for QSYS is that reporting should be implemented outsideof the system, the software itself should concentrate on its core competence and further jobsshould be excluded (the same is true for e.g. panel management tasks) but communication withthese external components should be done over clearly defined interfaces. One idea was to usethe R environment for statistical computing3 and its capabilities for reporting. R has severaladvantages compared to usual reporting packages:

• The possibilities of statistical computing with R are enormous (also for generating certaingraphics and charts). Of course, in the first version, simple descriptive statistics, likefrequency distributions, and graphics, like histograms, will be part of the report. Howeverwhen generating the infrastructure for using R as a reporting tool, improvement andextending the reports can be carried out with less effort.

• R supports several output formats, like PDF, HTML, RTF and even Latex.

• R contains a fully object oriented language, which enables well structured development.Even wrappers exist for all common programming languages.

• R is also open source, which fits well into the license strategy QSYS has.

• R has a big community behind it, which means masses of packages and functionality fromdifferent areas exist. Furthermore, a very active mailing list exists, which provides supportfor concrete questions.

1http://shibboleth.internet2.edu2http://openid.net3More information about the huge functionality of R can be found at http://www.r-project.org and R De-velopment Core Team (2006)

167

http://shibboleth.internet2.edu

http://openid.net

http://www.r-project.org

23 Additional Tasks to be Implemented

Because R was also used for the statistical analysis of the experiments, some code fragmentsgenerated for this purpose can be reused, which is also a positive side effect4.

23.0.6 Observation Features

Because of the introduction of Web 2.0 technologies like e.g. AJAX, observation of the user is notlimited to information gathered when the submit button is pressed, but it is also possible to trackuser behavior during the filling out process (which means what is done on the webpage). Concretescientific questions would be: which text was written in the text box before pressing the submitbutton (possibly the first statement written was deleted and substituted by a statement whichfits the social desirability for openended questions more; the same is applicable for closedendedquestions: did the respondent select another alternative than the one which was selected whensubmitting the form). In the future concrete experiments regarding which of these featuresshould be implemented could be conducted.

23.0.7 Accessibility

More effort will be invested in designing barrier-free questionnaires. The goal is to support allshould -criteria (AA) from http://www.w3.org/TR/WCAG20.

4A lot of literature has been published on different levels of R usage, e.g.: introduction to R: Ligges (2007),Everit & Hothorn (2007); graphics with R Murell (2006); linear modelling with R Faraway (2005), Wood(2006)

168

http://www.w3.org/TR/WCAG20

24 Evaluation of the Software

In this chapter, a short evaluation of QSYS with external criteria is given. E.g. Kokemüller(2007) already evaluated eight commercial software solutions. The criteria are taken from thisarticle and applied to QSYS. Additionally, some ideas for an academic evaluation of surveysoftware were taken from Pocknee & Robbie (2002). Batinic (2003, p.10) identifies the followingmain requirements for Web based survey tools (evaluation concerning QSYS is directly givennext to these points):

• Progress bar1 (graphical or simply the current and number of pages of the questionnaire).Yes.• Question filters used for branching. Yes, but actually branching can only be done dependingon the direct antecessor question.• Randomized assignment of respondents to different conditions. Yes, this feature is exten-sively supported, all possible variations are available (but HTML knowledge is necessaryfor implementation, because XSLT documents must be created).• Item rotation to avoid ranking effects2. No, this will be done in the next release.• Plausibility checks to automatically identify data inconsistency (ideally during entering

data). Yes, both, on client and on server side.• Possibility to send invitation letters and reminders. No, these tasks should be kept outside,a tool is planned which has these capabilities and communicates with QSYS.• Access limitation (e.g. the usage of passwords). Yes, there are several different accessmodes.• (Real time) report statistics. No, again this feature should be implemented in an additionaltool communicating with QSYS3.• Possible integration of multimedia (pictures, films,...). Yes, everything which can be donewith HTML, CSS and Javascript is possible.• Data export in common statistic-software (like SPSS, SAS4, etc.). Yes, CSV, which canbe imported in all common statistics packages, is supported• Multiple project managers can create questionnaires concurrently. Yes• Secure data transmission (SSL encryption). This depends on the server, but it is not aproblem to run QSYS e.g. on Tomcat using https.

In addition, Manfreda & Vehovar (2008, p.281f) give a list of features a professional surveysoftware packages should support5:

1See section 5.10 for actual findings2As described in section 5.3.43See section 23.0.5 for further details4http://www.sas.com5Again, QSYS evaluation is given next to the criteria

169

http://www.sas.com

24 Evaluation of the Software

• Sample management allowing the researcher to send out prenotifications, initial invitationsand follow ups for nonrespondents. Here again, this functionality6 should be implementedexternally with interfaces to QSYS.

• User-friendly interface for questionnaire design, with several features. For instance, manu-als, online help, tutorials, but also question/questionnaire libraries, and export from othersoftware packages. Yes, everything mentioned here exists within QSYS (e.g. a detaileddocumentation of the editor exists in both English and German).

• Flexible questionnaire design regarding layout (e.g. background color/pattern, fonts, mul-timedia, progress indicator), question forms (e.g. open, close, grid, semantic differential,yes/no questions), and features of computer-assisted survey information collection (e.g.complex branching, variable stimuli based on previous respondent’s selections, range con-trols, missing data and consistency checks. As already mentioned for Batinic (2003, p.10)’scriteria, all these points are implemented, except complex branching and related features.

• Reliable and secure transfer and storage of data. Storage of data was approved with sev-eral surveys run with QSYS, secure data transfer can be assured with https, if the Servletcontainer is configured this way.

Here a few additional criteria are given:

• User, developer and administrator documentation: documentation exists for all of thesethree roles.

• Printable survey version: it is possible to export the survey as PDF and print out this docu-ment. The way the questionnaire should look is fully customizable in editing the appropriateXSL-FO document. It is also possible to assign different style sheets which generate PDFsfor different questionnaires.

• Sufficient offer of question types: Yes7

• Software should not only be suitable for market and social research, but also for other sci-entific fields. Bälter & Bälter (2005) discuss the special needs of epidemiological research,where e.g. the necessity to support Visual Analogue Scales is mentioned. VAS are fullysupported in different graphical representations.

6Possibly in combination with an open source panel management tool, one candidate could be phpPanelAdmin(http://www.goeritz.net/panelware

7As can be seen in section 21.4

170

http://www.goeritz.net/panelware

25 Existing Open Source Online SurveyTools

There is a huge offer of online survey tools on the web, many of which are even available forfree. In this chapter, a short overview of the most important tools will be given. Only opensource tools are taken into consideration (although QSYS would bear comparison with some ofthe commercial providers), but the selection is not limited to those implemented in Java technol-ogy. Only the most striking tools were selected and described shortly in the following sections.Only the key features were presented as it is not worthwhile to give a detailed evaluation whenplanning to evaluate survey tools, a closer look at all tools mentioned below will be necessaryanyway. Assessing online providers for online surveys (commercial and non-commercial) is notsubject of this chapter as the focus is on the software1.

In the course of this thesis several open source portals were scanned for appropriate projects, e.g.http://sourceforge.net as well as Web pages from institutions dealing with online surveys:

• http://www.gesis.org At GESIS, only two active open source projects are currentlylisted, Lime Survey and Open Survey Pilot. Because it is essential for open source projectsto be constantly improved (bug fixes, functional increments), only those with an activedevelopment status are evaluated.

• http://www.websm.org: A lot of existing commercial and non-commercial software toolsfor survey research are listed on the website run by Web Survey Methodology.

25.1 Limesurvey (formerly PHPSurveyor)

Limesurvey2 seems to be the most widely distributed open source online survey tool. It iswritten in PHP, has extensive user documentation and some basic descriptions on how to installthe system. Most of the common question types are supported (some concrete templates arealso integrated like yes/no-questions or questioning gender). It offers support for multi lingualsurveys, comes with a WYSIWYG HTML Editor, can integrate pictures and movies into asurvey, it is possible to print out the survey, different access control modes are supported (alsoLDAP is supported), data export in several formats is possible (like SPSS), basic reporting isoffered, and it even has screen reader accessibility. A WIKI for documentation has been installedwhich enables information for users, developers and administrators to be entered (in multiplelanguages) as well as a forum, where users can post their problems.

1For further information on this see e.g. McCalla (2003, p.60) for a short overview2http://www.limesurvey.org

171

http://sourceforge.net

http://www.gesis.org

http://www.websm.org

http://www.limesurvey.org

25 Existing Open Source Online Survey Tools

25.2 FlexSurvey

FlexSurvey3 is a small but powerful tool written in PHP which allows fast and flexible creationof online surveys. It does not provide a GUI for creating the surveys, which need not necessarilybe a disadvantage. If one is acquainted with the technology and the structure of this software,developing surveys becomes much faster, because only (PHP-)files have to be edited and noform fields of a (Web) editor frontend has to be filled out4. Of course creation becomes moreflexible when code is directly edited, but the necessary knowledge excludes the majority of surveydesigners, because basic knowledge of (X)HTML, PHP, and CSS is needed, at least if you wantto move beyond very simple surveys.

25.3 Mod_survey

Mod_Survey5 is a mod_perl6 module for Apache. In the core version, no editor exists, XMLfiles have to be edited to create a survey. Documentation exists, but only the tags to be setwithin the XML file are described. Recently, a graphical editor for creating these questionnaireXML documents was created (separately from the core tool). This approach of course hasthe drawback of not being very end-user friendly, but there are also some advantages: (1) themore possibilities offered to influence code creation, the more flexibility offered in designing thesurvey. Also (2) improvements (like new question types or new features) can be developed muchfaster because no modifications have to be conducted on an editor. (3) The way in which theXML documents are generated is free to the end user, which means these files can be generatedautomatically from any source or a custom editor could be written to generate these files.

25.4 MySurveyServer

MySurveyServer7 is a Java developed online survey tool (the Web-GUI is based on Struts 1.1and the business logic uses EJB Session Beans) currently in alpha state. It was last released inMarch 2003, which means that there have not been any recent developments in the past years.Unfortunately, no documentation could be found.

25.5 phpESP

phpESP8 is a collection of PHP scripts to let non-technical users create and administer surveys,gather results and view statistics. All tasks can be managed online after database initialization.

3http://www.flexsurvey.de4QSYS offers a similar approach in directly editing the questionnaire XML-files and uploading them whenfinished. This is only possible if the creator is very familiar with the XML-schema used for questionnaires,but if so, creation becomes incomparable fast.

5http://www.modsurvey.org6mod_perl is an optional module for the Apache HTTP server (http://httpd.apache.org). It embeds a Perlinterpreter into the Apache server, so that dynamic content produced by Perl scripts can be served in responseto incoming requests (http://perl.apache.org)

7http://mysurveyserver.sourceforge.net8http://phpesp.sourceforge.net

172

http://www.flexsurvey.de

http://www.modsurvey.org

http://httpd.apache.org

http://perl.apache.org

http://mysurveyserver.sourceforge.net

http://phpesp.sourceforge.net


There is a demo online which allows the examination of the editor. Input controls like simpletext boxes, radio buttons, dropdown boxes, rating scales and numeric and date-input fields aresupported. Although the editor looks very old fashioned, creation of simple questions is veryeasy and intuitional. All actions which can be performed with the software are described incooking recipe style.

25.6 Rapid Survey Tool (formerly Rostock Survey Tool)

Rapid Survey Tool9 is an online survey tool written in Perl, which is responsible for creatingand displaying survey pages, storing the variables entered by the respondent, exporting data(SPSS export is supported) as well as generating short reports out of the results. Data is notstored within a database but in single files. A questionnaire is created by generating a textfile containing all question definitions written in a custom markup language. All markup signsare described together with an installation guide on the documentation page of the software.Again, this is a simple way to efficiently create and run small surveys. The questionnaire itselfappears relatively old-fashioned, but this can be improved by editing the source files responsiblefor creation of the questionnaire.

25.7 Additional Web Survey Tools

Subsequently a list of tools which are not described is given (reasons are e.g. lack of doc-umentation, no active development or too small range of functionality): Phpsurvey (http://phpsurvey.sourceforge.net), ActionPoll from Open Source Technology Group (http://sourceforge.net/projects/actionpoll, Open Survey Pilot (http://www.opensurveypilot.org), Web Survey Toolbox (http://www.aaronpowers.com/websurveytoolbox), PHPSurvey(http://phpsurvey.sourceforge.net), Socrates Questionnaire Engine (http://socrates-qe.sourceforge.net), SurJey (http://surjey.sourceforge.net), ZClasses Survey/quiz product(http://www.zope.org/Members/jwashin/Survey), Multi-Platform Survey Architect (http://sourceforge.net/projects/mpsa) and Internet Survey Engine (Insuren) (http://insuren.sourceforge.net).

25.7.1 Tools for HTML Form Processing

25.7.1.1 Generic HTML Form Processor

This piece of PHP code is not a survey software as a whole, but it can assist students andresearchers to quickly set up surveys that can be administered via the Web. A simple (buteffective) mapping of entries from a HTML form to database, with additional functionality likeinput validation, random assignment of participants to experimental conditions, and password-protection, can be carried out. The software can be obtained from http://www.goeritz.net/brmic and additional information can be found in Göritz & Birnbaum (2005).

9http://www.hinner.com/rst

173

http://phpsurvey.sourceforge.net


http://sourceforge.net/projects/actionpoll

http://sourceforge.net/projects/actionpoll

http://www.opensurveypilot.org

http://www.opensurveypilot.org

http://www.aaronpowers.com/websurveytoolbox


http://socrates-qe.sourceforge.net

http://socrates-qe.sourceforge.net

http://surjey.sourceforge.net

http://www.zope.org/Members/jwashin/Survey

http://sourceforge.net/projects/mpsa

http://sourceforge.net/projects/mpsa

http://insuren.sourceforge.net

http://insuren.sourceforge.net

http://www.goeritz.net/brmic

http://www.goeritz.net/brmic

http://www.hinner.com/rst


25.7.1.2 SurveyWiz and FactorWiz

These online tools10 give a helping hand for the generation of HTML forms used directly foronline experimenting or for any other needs where HTML forms are required. The functionalityis written in Javascript. The technology is a bit outdated but can nevertheless be helpful ingiving introductory assistance in HTML form creation. For further information follow the linksor consider Birnbaum (2000).

25.7.2 Experiment Supporting Frameworks

25.7.2.1 WEXTOR

WEXTOR11 is a Web based tool that lets you quickly design and visualize laboratory ex-periments and Web experiments in a guided step-by-step process. It dynamically creates thecustomized Web pages needed for the experimental procedure anytime, anywhere, on any plat-form. It delivers a print-ready display of your experimental design. WEXTOR can be seenas an attempt to standardize Web experimenting with the support of a framework. For moreinformation on this project, take a look at the website or at Reips & Neuhaus (2002).

25.7.3 Tools for Retrieving Paradata

25.7.3.1 Scientific LogAnalyzer

Scientific LogAnalyzer is a platform independent interactive Web service for the analysis of logfiles. Scientific LogAnalyzer offers several features not available in other log file analysis tools,for example, organizational criteria and computational algorithms suited to aid behavioral andsocial scientists. For more information, see Reips & Stieger (2004). It is hard to find theadditional scientific impact of this tool compared to other log analyzing tools (there are a lot inthe field of open source).

25.8 Conclusion

After this short evaluation, the value of QSYS increases. Except for LimeSurvey, there is no otheropen source tool which offers so many opportunities and has comparable software architecture.

10http://psych.fullerton.edu/mbirnbaum/programs/surveyWiz1.htm11http://psych-wextor.unizh.ch/wextor/en

174

http://psych.fullerton.edu/mbirnbaum/programs/surveyWiz1.htm

http://psych-wextor.unizh.ch/wextor/en

Bibliography

Andrews, D., Nonnecke, B. & Preece, J. (2003), ‘Conducting Research on the Internet: On-line Survey Design, Development and Implementation Guidelines’, International Journal ofHuman-Computer Interaction 16(2), 185–210.

Bachleitner, R. & Weichbold, M. (2007), ‘Befindlichkeit - eine Determinante im Antwortverhal-ten’, Zeitschrift für Soziologie 36(3), 182–196.

Bälter, O. & Bälter, K. A. (2005), ‘Demands on Web Survey Tools for Epidemiological Research’,European Journal of Epidemiology 20, 137–139.

Bandilla, W. & Bosnjak, M. (2000), Online-Surveys als Herausforderung für die Umfrage-forschung - Chancen und Probleme, in Peter Ph. Mohler and Paul Luettinger (2000), pp. 71–82.

Bandilla, W. & Bosnjak, M. (2003), ‘Survey Administration Effects? A Comparison of Web-Based and Traditional Written Self-Administered Surveys Using the ISSP Environment Mod-ule’, Social Science Computer Review 21(2), 235–243.

Batinic, B. (2003), ‘Internetbasierte Befragungsverfahren’, Österreichische Zeitschrift für Sozi-ologie (4), 6–18.

Batinic, B., Reips, U.-D. & Bosnjak, M., eds (2002), Online Social Sciences.

Bech, M. & Christensen, M. B. (2009), ‘Differential Response Rates in Postal and Web-BasedSurveys Among Older Respondents’, Survey Research Methods 3(1), 1–6.

Best, S. J., Krueger, B., Hubbard, C. & Smith, A. (2001), ‘An Assessment of the Generalizabilityof Internet Surveys’, Social Science Computer Review 19(2), 131–145.

Biemer, P. B. & Lyberg, L. E. (2003), Introduction To Survey Quality, Wiley Interscience,Hoboken, New Jersey.

Birnbaum, M. H. (2000), ‘SurveyWiz and FactorWiz: JavaScript Web Pages that make HTMLForms for Research on the Internet’, Behavior Research Methods, Instruments, & Computers32(2), 339–346.

Birnholtz, J. P., Horn, D. B., Finholt, T. A. & Bae, S. J. (2004), ‘The Effects of Cash, Electronic,and Paper Gift Certificates as Respondent Incentives for a Web-Based Survey of Technologi-cally Sophisticated Respondents’, Social Science Computer Review 22(3), 355–362.

Bosnjak, M. & Tuten, T. L. (2001), ‘Classifying Response Behaviors in Web-Based Surveys’,Journal of Computer-Mediated Communication 6(3).

Bosnjak, M. & Tuten, T. L. (2003), ‘Prepaid and Promised Incentives in Web Surveys: AnExperiment’, Social Science Computer Review 21(2), 208–217.

175

Bibliography

Box-Steffensmeier, J. M. & Jones, B. S. (2004), Event History Modeling. A Guide for SocialScientists, Cambridge University Press.

Caldwell, B., Cooper, M., Reid, L. G. & Vanderheiden, G. (2008), ‘Web Content AccessibilityGuidelines 2.0’. Online; accessed 21-May-2009.URL: http://www.w3.org/TR/WCAG20/

Christian, L. M. (2003), The Influence of Visual Layout on Scalar Questions in Web Surveys,Master’s thesis, Washington State University. Department of Sociology.

Christian, L. M. & Dillman, D. A. (2004), ‘The Influence of Graphical and Symbolic Lan-guage Manipulations on Responses to Self-Administered Questions’, Public Opinion Quarterly68(1), 57–80.

Christian, L. M., Dillman, D. A. & Smyth, J. D. (2007), ‘Helping Respondents Get it Right theFirst Time: The Influence of Words, Symbols and Graphics in Web Surveys’, Public OpinionQuarterly 71(1), 113–125.

Conrad, F. G., Couper, M. P. & Tourangeau, R. (2003), ‘Interactive Features in Web Surveys’,Joint Meetings of the American Statistical Association San Francisco, CA.

Conrad, F. G., Couper, M. P., Tourangeau, R. & Peytchev, A. (2005), ‘Impact of Progress Feed-back on Task Completion: First Impressions Matter’, Proceedings of SIGCHI 2005: HumanFactors in Computing Systems Portland, OR.

Conrad, F. G., Couper, M. P., Tourangeau, R. & Peytchev, A. (2006), ‘Use and Non-use ofClarification Features in Web Surveys’, Journal of Official Statistics 22(2), 245–269.

Conrad, F. G., Schober, M. F. & Coiner, T. (2007), ‘Bringing Features of Human Dialogue toWeb Surveys’, Applied Cognitive Psychology 21, 165–187.

Cook, C., Heath, F., Thompson, R. L. & Thompson, B. (2001), ‘Score Reliability in Web-or Internet-Based Surveys: Unnumbered Graphic Rating Scales versus Likert-Type Scales’,Educational and Psychological Measurement 61(4), 697–706.

Couper, M. P. (2000), ‘Web Surveys. A Review of Issues and Approaches’, Public OpinionQuarterly 64, 464–494.

Couper, M. P. (2001), ‘Web Survey Research: Challenges and Opportunities’, Proceedings ofthe Annual Meeting of the American Statistical Association.

Couper, M. P. (2005), ‘Technology Trends in Survey Data Collection’, Social Science ComputerReview 23(4), 486–501.

Couper, M. P., & Coutts, E. (2004), ‘Probleme und Chancen verschiedener Arten von Online-Erhebungen’, Kölner Zeitschrift für Soziologie und Sozialpsychologie Sonderheft 44, 217–243.

Couper, M. P., Conrad, F. G. & Tourangeau, R. (2007), ‘Visual Context Effects in Web Surveys’,Public Opinion Quarterly 71(4), 623–634.

Couper, M. P., Kapteyn, A., Schonlau, M. & Winter, J. (2007), ‘Noncoverage and Nonresponsein an Internet Survey’, Social Science Research 36, 131–148.

176

Bibliography

Couper, M. P. & Miller, P. V. (2008), ‘Web Survey Methods’, Public Opinion Quarterly72(5), 831–835.

Couper, M. P., Tourangeau, R., Conrad, F. G. & Singer, E. (2006), ‘Evaluating the Effectivenessof Visual Analog Scales: A Web Experiment’, Social Science Computer Review 24(2), 227–245.

Couper, M. P., Tourangeau, R. & Kenyon, K. (2004), ‘Picture This! Exploring Visual Effects inWeb Surveys’, Public Opinion Quarterly 68, 255–266.

Couper, M. P., Tourangeau, R., Konrad, F. G. & Crawford, S. D. (2004), ‘What They See IsWhat We Get. Response Options for Web Surveys’, Social Science Computer Review 22, 111–127.

Couper, M. P., Traugott, M. W. & Lamias, M. J. (2001), ‘Web Survey Design and Administra-tion’, Public Opinion Quarterly 65, 230–253.

Couper, M. P., Traugott, M. W. & Lamias, M. J. (2004), Web Survey Design and Administration,in ‘Questionnaires’, SAGE Publications, London, pp. 362–381.

Crawford, S. D., Couper, M. P. & Lamias, M. J. (2001), ‘Web Surveys: Perceptions of Burden’,Social Science Computer Review 19(2), 146–162.

Crawford, S., McCabe, S. E. & Pope, D. (2005), ‘Applying Web-Based Survey Design Standards’,Journal of Prevention and Intervention in the Community 29(1/2), 43–66.

Czaja, R. & Blair, J. (1996), Designing Surveys. A Guide To Decisions and Procedures, SAGEPublications Ltd., Thousand Oaks, California.

de Leeuw, E. D. (2005), ‘To Mix or Not to Mix Data Collection Modes in Surveys’, Journal ofOfficial Statistics 21(2), 233–255.

de Leeuw, E. D. (2008a), Choosing the Method of Data Collection, in de Leeuw et al. (2008),pp. 113–135.

de Leeuw, E. D. (2008b), Self-administered Questionnaires: Mail Surveys and other Applications,in de Leeuw et al. (2008), pp. 239–263.

de Leeuw, E. D. & Hox, J. J. (2008), Mixed-mode Surveys:When and Why, in de Leeuw et al.(2008), pp. 299–316.

de Leeuw, E. D., Hox, J. J. & Dillman, D. A., eds (2008), International Handbook of SurveyMethodology, European Association of Methodology.

Denscombe, M. (2006), ‘Web-Based Questionnaires and the Mode Effect: An Evaluation Basedon Completion Rates and Data Contents of Near-Identical Questionnaires Delivered in Dif-ferent Modes’, Social Science Computer Review 24(2), 246–254.

Derouvray, C. & Couper, M. P. (2002), ‘Designing a Strategy for Reducing ’No Opinion’ Re-sponses in Web-Based Surveys’, Social Science Computer Review 20(1), 3–9.

Deutskens, E., de Ruyter, K., Wetzels, M. & Oosterveld, P. (2004), ‘Response Rate and ResponseQuality of Internet-Based Surveys: An Experimental Study’, Marketing Letters 15(1), 21–36.

177

Bibliography

DeVellis, R. F. (1991), Scale Development. Theory and Applications, Applied Social ResearchMethods Series. Volume 26, SAGE Publications Inc., Newbury Park, California.

Dever, J. A., Rafferty, A. & Valliant, R. (2008), ‘Internet Surveys: Can Statistical AdjustmentsEliminate Coverage Bias?’, Survey Research Methods 2(2), 47–60.

Diekmann, A. (1999), Empirische Sozialforschung. Grundlagen, Methoden, Anwendungen, fifthedn, Rowohlts Enzyklopädie, Reinbeck bei Hamburg.

Dillman, D. A. (2007), Mail and Internet Surveys. The Tailored Design Method, second edn,John Wiley and Sons, Inc., Hoboken, New Jersey.

Dillman, D. A. & Bowker, D. K. (2001), The Web Questionnaire Challenge to Survey Method-ologists, in Reips & Bosnjak (2001).

Dillman, D. A. & Christian, L. M. (2005a), ‘Survey Mode as a Source of Instability in Responsesacross Surveys’, Field Methods 17(1), 30–52.

Dillman, D. A. & Christian, L. M. (2005b), ‘Survey Mode as a Source of Instability in Responsesacross Surveys’, Paper presented at the Workshop on Stability of Methods for Collecting,Analyzing and Managing Panel Data, American Academy of Arts and Sciences.

Dillman, D. A., Phelps, G., Tortora, R., Swift, K., Kohrell, J., Berck, J. & Messer, B. L. (2008),‘Response Rate and Measurement Differences in Mixed Mode Surveys Using Mail, Telephone,Interactive Voice Response (IVR) and the Internet’, Social Science Research forthcoming.

Dillman, D. A., Tortora, R. & Bowker, D. (1998), Principles for Constructing Web Surveys,Technical report, SESRC.

Duffy, B., Smith, K., Terhanian, G. & Bremer, J. (2005), ‘Comparing Data from Online andFace-to-Face Surveys’, International Journal of Market Research 47(6), 615–639.

Ehling, M. (2003), Online-Erhebungen - Einführung in das Thema, in Statistisches Bundesamt(2003), pp. 11–20.

Ekman, A., Klint, A., Dickman, P. W., Adami, H.-O. & Litton, J. E. (2007), ‘Optimizing theDesign of Web-Based Questionnaires - Experience from a Population Based Study among50,000 Women’, European Journal of Epidemiology 22(5), 293–300.

Everit, B. S. & Hothorn, T. (2007), A Handbook fo Statistical Analyses using R, Chapman &Hall/CRC, Boca Raton.

Faas, T. & Schoen, H. (2006), ‘Putting a Questionnaire on the Web is not Enough - A Compar-ison of Online and Offline Surveys Conducted in the Context of the German Federal Election2002’, Journal of Official Statistics 22(2), 177–190.

Faraway, J. J. (2005), Linear Models with R, Chapman & Hall/CRC Texts in Statistical ScienceSeries, Boca Raton, Florida.

Flynn, D., van Schaik, P. & van Wersch, A. (2004), ‘A Comparison of Multi-Item Likert andVisual Analogue Scales for the Assessment of Transactionally Defined Coping Function’, Eu-ropean Journal of Psychological Assessment 20(1), 49–58.

178

Bibliography

Fricker, S., Galesic, M., Tourangeau, R. & Yan, T. (2005), ‘An Experimental Comparison ofWeb and Telephone Surveys’, Public Opinion Quarterly 69(3), 370–392.

Fuchs, M. (2003), ‘Kongitive Prozesse und Antwortverhalten in einer Internet-Befragung’, Öster-reichische Zeitschrift für Soziologie (4), 6–18.

Fuchs, M. (2008), ‘Die Video-unterstützte Online-Befragung. Auswirkungen auf den Frage-Antwort-Prozess und die Datenqualität’, Presentation at “Grenzen und Herausforderungender Umfrageforschung”, Salzburg.

Fuchs, M. & Funke, F. (2007), Multimedia Web Surveys: Results from a Field Experiment onthe use of Audio and Video Clips in Web Surveys, in ‘The Challenges of a Changing World.Proceedings of the Fifth International Conference of the Association for Survey Computing’,M. Trotman et al.

Funke, F. (2003), Vergleich Visueller Analogskalen mit Kategorialskalen in Offline- und On-linedesign, Master’s thesis, Institut für Soziologie, Justus-Liebig-Universität Gießen.

Funke, F. (2004), Online- und Offlinevergleich Visueller Analogskalen mit 4- und 8-stufigskalierten Likert-Skalen bei einem Fragebogen zum Verhalten in sozialen Gruppen, in ‘SozialeUngleichheit, Kulturelle Unterschiede - Verhandlungen des 32. Kongresses der DeutschenGesellschaft für Soziologie in München’, Campus, Frankfurt am Main, pp. 4826–4838.

Funke, F. (2005), ‘Visual Analogue Scales in Online Surveys’, Poster Presentation at 7th GeneralOnline Research (GOR) Conference, Zürich, Switzerland.

Funke, F. & Reips, U.-D. (2005), ‘Stichprobenverzerrung durch Browserbedingten Dropout’, Vor-trag bei der Tagung Methodensektion der Deutschen Gesellschaft für Soziologie in Mannheim.

Funke, F. & Reips, U.-D. (2006), ‘Visual Analogue Scales in Online Surveys: Non-Linear DataCategorization by Transformation with Reduced Extremes’, Poster Presentation at 8th GeneralOnline Research (GOR) Conference, Bielefeld, Germany.

Funke, F. & Reips, U.-D. (2007a), Datenerhebung im Netz: Messmethoden und Skalen, in Welker& Wenzel (2007).

Funke, F. & Reips, U.-D. (2007b), ‘Dynamic Forms: Online Surveys 2.0’, Paper Presentation at9th General Online Research (GOR) Conference, Leipzig, Germany.

Funke, F. & Reips, U.-D. (2007c), ‘Improving Data Quality in Web Surveys with Visual AnalogueScales’, Paper presented at the second Conference of the European Research Association,Prague (CZ).

Funke, F. & Reips, U.-D. (2008a), ‘Differences and Correspondences Between Visual AnalogueScales, Slider Scales and Radio Button Scales in Web Surveys’, Poster Presentation at 10th

annual General Online Research (GOR) Conference, Hamburg, Germany.

Funke, F. & Reips, U.-D. (2008b), ‘Visual Analogue Scales versus Categorical Scales: RespondentBurden, Cognitive Depth, and Data Quality’, Paper presented at the 10th annual GeneralOnline Research (GOR) Conference, Hamburg, Germany.

Galesic, M. (2006), ‘Dropouts on the Web: Effects of Interest and Burden Experienced Duringan Online Survey’, Journal of Official Statistics 22(2), 313–328.

179

Bibliography

Galesic, M., Tourangeau, R., Couper, M. P. & Conrad, F. G. (2008), ‘Eye-Tracking Data. NewInsights on Response Order Effects and other Cognitive Shortcuts in Survey Responding’,Public Opinion Quarterly 72(5), 892–913.

Ganassali, S. (2008), ‘The Influence of the Design of Web Survey Questionnaires on the Qualityof Responses’, Survey Research Methods 2(1), 21–32.

Gerich, J. (2007), ‘Visual Analogue Scales for Mode-Independent Measurement in Self-Administered Questionnaires’, Behavior Research Methods 39(4), 985–992.

Gerich, J. (2008), ‘Multimediale Elemente in der Computerbasierten Datenerhebung’, Presen-tation at “Grenzen und Herausforderungen der Umfrageforschung”, Salzburg.

Gnambs, T. (2008), ‘Response Effects of Colour Cues in Online Surveys: Exploratory Findings’,Poster Presentation at 10th annual General Online Research (GOR) Conference, Hamburg,Germany.

Göritz, A. S. (2006a), ‘Cash Lotteries as Incentives in Online Panels’, Social Science ComputerReview 24(4), 445–459.

Göritz, A. S. (2006b), ‘Incentives in Web Studies: Methodological Issues and a Review’, Inter-national Journal of Internet Science 1(1), 58–70.

Göritz, A. S. & Birnbaum, M. H. (2005), ‘Generic HTML Form Processor: A Versatile PHPScript to Save Web-Collected Data into a MySQL Database’, Behavior Research Methods37(4), 703–710.

Göritz, A. S. & Stieger, S. (2008), ‘The High-Hurdle Technique put to the Test: Failure toFind Evidence that Increasing Loading Times Enhances Data Quality in Web-Based Studies’,Behavior Research Methods 40(1), 322–327.

Granello, D. H. & Wheaton, J. E. (2004), ‘Online Data Collection: Strategies for Research’,Journal of Counseiling and Development 82(4), 387–393.

Groves, R. M., Dillman, D. A., Eltinge, J. L. & Little, R. J. A., eds (2002), Survey Nonresponse,John Wiley & Sons, New York.

Groves, R. M. & Peytcheva, E. (2008), ‘The Impact of Nonresponse Rates on Nonresponse Bias.A Meta-Analysis’, Public Opinion Quarterly 72(2), 167–189.

Hamilton, M. B. (2004), Attrition Patterns in Online Surveys. Analysis and Guidance for In-dustry, White paper, Tercent inc. Online; accessed 21-May-2009.URL: http://www.supersurvey.com/papers/supersurvey_white_paper_attrition.htm

Hassenzahl, M. & Peissner, M., eds (2005), Usability Professionals 2005, German Chapter ofthe Usability Professionals Association e.V.

Hasson, D. & Arnetz, B. B. (2005), ‘Validation and Findings Comparing VAS vs. Likert Scales forPsychosocial Measurements’, International Electronic Journal of Health Education (8), 178–192.

Healey, B. (2007), ‘Drop Downs and Scroll Mice. The Effect of Response Option Format and In-put Mechanism Employed on Data Quality in Web Surveys’, Social Science Computer Review25(1), 111–128.

180

Bibliography

Healey, B., Macpherson, T. & Kuijten, B. (2005), ‘An Empirical Evaluation of Three Web SurveyDesign Principles’, Marketing Bulletin 16(Research Note 2), 1–9.

Hedlin, D., Lindkvist, H., Bäckström, H. & Erikson, J. (2008), ‘An Experiment on PerceivedSurvey Response Burden Among Businesses’, Journal of Official Statistics 24(2), 301–318.

Heerwegh, D. (2002), ‘Describing Response Behavior in Websurveys Using Client Side Paradata’,Paper presented at the International Workshop on Websurveys held by ZUMA, 17-19 October2002, Mannheim, Germany.

Heerwegh, D. (2003), ‘Explaining Response Latencies and Changing Answers Using Client-SideParadata from a Web Survey’, Social Science Computer Review 21(3), 360–373.

Heerwegh, D. (2004a), ‘Uses of Client Side Paradata in Web Surveys’, Paper presented at theInternational Symposium in Honour of Paul Lazarsfeld, Brussels, Belgium.

Heerwegh, D. (2004b), ‘Using Progress Indicators in Web Surveys’, Paper presented at the 59th

AAPOR Conference, Phoenix, Arizona.

Heerwegh, D. & Loosveldt, G. (2002a), ‘An Evaluation fo the Effect of Response Formats onData Quality in Web Surveys’, Social Science Computer Review 20(4), 471–484.

Heerwegh, D. & Loosveldt, G. (2002b), ‘An Evaluation of the Effect of Response Formats onData Quality in Web Surveys’, Paper presented at the International Workshop on HouseholdSurvey Nonresponse, Copenhagen, Denmark.

Heerwegh, D. & Loosveldt, G. (2002c), ‘Web Surveys: The Effect of Controlling Survey AccessUsing PIN Numbers’, Social Science Computer Review 20(1), 10–21.

Heerwegh, D. & Loosveldt, G. (2006a), ‘An Experimental Study on the Effects of Personalization,Survey Length Statements, Progress Indicators, and Survey Sponsor Logos in Web Surveys’,Journal of Official Statistics 22(2), 191–210.

Heerwegh, D. & Loosveldt, G. (2006b), ‘Personalizing e-Mail Contacts: It’s Influence on WebSurvey Response Rate and Social Desirability Response Bias’, International Journal of PublicOpinion Research 19(2), 258–268.

Heerwegh, D. & Loosveldt, G. (2008), ‘Face-to-Face versus Web Surveying in a High-Internet-Coverage Population. Differences in Response Quality’, Public Opinion Quarterly 72(5), 836–846.

Heerwegh, D., Vanhove, T., Loosveldt, G. & Matthijs, K. (2004), ‘Effects of Personalization onWeb Survey Response Rates and Data Quality’, Paper presented at the Sixth InternationalConference on Logic and Methodology (RC-33).

Hofmans, J., Theuns, P., Baekelandt, S., Mairesse, O., Schillewaert, N. & Cools, W. (2007),‘Bias and Changes in Perceived Intensity of Verbal Qualifiers Effected by Scale Orientation’,Survey Research Methods 1(2), 97–108.

Holm, K. (1975a), Die Frage, in Holm (1975b), pp. 32–90.

Holm, K., ed. (1975b), Die Befragung, Francke Verlag GmbH.

181

Bibliography

Holtgrewe, U. & Brand, A. (2007), ‘Die Projektpolis bei der Arbeit. Open-Source Softwareen-twicklung und der “Neue Geist des Kapitalismus”’, Österreichische Zeitschrift für Soziologie(3), 25–45.

Jackob, N. & Zerback, T. (2006), ‘Improving Quality by Lowering Non-Response - A Guidelinefor Online Surveys’, Paper presented at the WAPOR-Seminar ’Quality Criteria in SurveyResearch VI’, Cadenabbia, Italy.

Joinson, A., McKenna, K., Postmes, T. & Reips, U.-D., eds (2007), Oxford Handbook of InternetPsychology.

Kaczmirek, L. (2005), Web Surveys. A Brief Guide on Usability and Implementation Issues, inHassenzahl & Peissner (2005), pp. 102–105.

Kaczmirek, L. & Schulze, N. (2005), ‘Standards in Online Surveys. Sources for ProfessionalCodes of Conduct, Ethical Guidelines and Quality of Online Surveys. A Guide of the WebSurvey Methodology Site’. Online; accessed 21-Mai-2009.URL: http://www.websm.org/2009/05/Home/Community/Guides/

Karg, M. & Krebs, S. (2008), ‘Der saubere Weg. Herstellerunabhängiges Reporting mit XSLund Co.’, iX. Magazin für professionelle Informationstechnik (6), 106–109.

Kokemüller, J. (2007), ‘Online-Umfragen: Acht Softwarelösungen’, iX. Magazin für profes-sionelle Informationstechnik (11), 110–115.

Kreuter, F., Presser, S. & Tourangeau, R. (2008), ‘Social Desirability Bias in CATI, IVR,and Web Surveys. The Effects of Mode and Question Sensitivity’, Public Opinion Quarterly72(5), 847–865.

Krysan, M. & Couper, M. P. (2006), ‘Race of Interviewer Effects: What Happens on the Web?’,International Journal of Internet Science 1(1), 17–28.

Ligges, U. (2007), Programmieren mit R, second edn, Springer Verlag, Berlin, Heidelberg, NewYork.

Litwin, M. S. (1995), How To Measure Survey Reliability and Validity, SAGE Publications, Inc.,Thousand Oaks, California.

Lohr, S. L. (2008), Coverage and Sampling, in de Leeuw et al. (2008), pp. 97–112.

Loosveldt, G. & Sonck, N. (2008), ‘An Evaluation of the Weighting Procedures for an OnlineAccess Panel Survey’, Survey Research Methods 2(2), 93–105.

Lumsden, J. & Morgan, W. (2005), ‘Online-Questionnaire Design: Establishing Guidelines andEvaluating Existing Support’, Presentation at the 16th Annual International Conference ofthe Information Resources Management Association.

Lütters, H., Westphal, D. & Heublein, F. (2007), ‘SniperScale: Graphical Scaling in Data Col-lection and its Effect on the Response Behaviour of Participants in Online Studies’, PaperPresentation at 9th General Online Research (GOR) Conference, Leipzig, Germany.

Lynn, P. (2008), The Problem of Nonresponse, in de Leeuw et al. (2008), pp. 35–55.

182

Bibliography

Malhotra, N. (2008), ‘Completion Time and Response Order Effects in Web Surveys’, PublicOpinion Quarterly 72(5), 914–934.

Manfreda, K. L. (2001), Web Survey Errors, PhD thesis, University of Ljubliana.

Manfreda, K. L. & Vehovar, V. (2008), Internet Surveys, in de Leeuw et al. (2008), pp. 264–284.

Mathieson, K. & Doane, D. P. (2003), ‘Using Fine-Grained Likert Scales in Web Surveys’,Alliance Journal of Business Research 1(1), 27–34.

Mayntz, R., Holm, K. & Hübner, P. (1978), Einführung in die Methoden der empirischen Sozi-ologie, fifth edn, Westdeutscher Verlag, Opladen.

McCalla, R. A. (2003), ‘Getting Results from Online Surveys - Reflections on a Personal Jour-ney’, Electronic Journal of Business Research Methods 2(1), 55–62.

McDonald, H. & Adam, S. (2003), ‘A Comparison of Online and Postal Data Collection Methodsin Marketing Research’, Marketing Intelligence & Planning 21(2), 85–95.

Meckel, M., Walters, D. & Baugh, P. (2005), ‘Mixed-Mode Surveys Using Mail and Web Ques-tionnaires’, Electronic Journal of Business Research Methods 3(1), 69–80.

Mercay, J. & Bouzeid, G. (2002), ‘Boost Struts with XSLT and XML’. Online; accessed 21-Mai-2009.URL: http://www.javaworld.com/javaworld/jw-02-2002/jw-0201-strutsxslt.html

Mummendey, H. D. (2003), Die Fragebogen-Methode, fourth edn, Hogrefe-Verlag GmbH, Göt-tingen.

Murell, P. (2006), R Graphics, Chapman & Hall/CRC, Boca Raton.

Noelle-Neumann, E. & Petersen, T. (2000), Alle, Nicht Jeder, third edn, Springer Verlag, Berlin.

Peter Ph. Mohler and Paul Luettinger, ed. (2000), Querschnitt - Festschrift für Max Kaase,ZUMA, Mannheim.

Peytchev, A., Couper, M. P. & McCabe, S. E. (2006), ‘Web Survey Design. Paging VersusScrolling’, Public Opinion Quarterly 70(4), 596–607.

Peytchev, A. & Crawford, S. (2005), ‘A Typology of Real-Time Validations in Web-Based Sur-veys’, Social Science Computer Review 23(2), 235–249.

Pocknee, C. & Robbie, D. (2002), ‘Surveyor: A Case Study of a Web-Based Survey Tool forAcademics’, Paper Presentation at the ASCILITE 2002, December 8-11, 2002; Auckland, NewZealand.

Potaka, L. (2008), ‘Comparability and Usability: Key Issues in the Design of Internet Forms forNew Zealand’s 2006 Census of Populations and Dwellings’, Survey Research Methods 2(1), 1–10.

Quintano, C., Castellano, R. & D’Agostino, A. (2006), ‘The Transition from University to Work:Web Survey Process Quality’, Metodolos̆ki zveski 3(2), 335–354.

183

Bibliography

R Development Core Team (2006), R: A Language and Environment for Statistical Computing,R Foundation for Statistical Computing, Vienna, Austria. Online; accessed 21-Mai-2009.URL: http://www.R-project.org

Rager, M. (2001), Sozialforschung im Internet. Fragebogenuntersuchungen im World Wide Web,Master’s thesis, Institut für Kultursoziologie, Universität Salzburg.

Reips, U.-D. (2002a), Context Effects in Web Surveys, in Batinic et al. (2002).

Reips, U.-D. (2002b), ‘Internet-Based Psychological Experimenting. Five Dos and Five Don’ts’,Social Science Computer Review 20(3), 241–249.

Reips, U.-D. (2002c), ‘Standards for Internet-Based Experimenting’, Experimental Psychology49(4), 243–256.

Reips, U.-D. & Bosnjak, M., eds (2001), Dimensions of Internet Science, Pabst Science Publish-ers Lengerich, Germany.

Reips, U.-D. & Funke, F. (2007), ‘VAS Generator - A Web-Based Tool for Creating VisualAnalogue Scales’, Paper Presentation at 9th General Online Research (GOR) Conference,Leipzig, Germany.

Reips, U.-D. & Funke, F. (2008), ‘Interval-Level Measurement with Visual Analogue Scales inInternet-Based Research: VAS Generator’, Behavior Research Methods 40(3), 699–704.

Reips, U.-D. & Neuhaus, C. (2002), ‘WEXTOR: A Web-based Tool for Generating and Visu-alizing Experimental Designs and Procedures’, Behavior Research Methods, Instruments, &Computers 34(2), 234–240.

Reips, U.-D. & Stieger, S. (2004), ‘Scientific LogAnalyzer: A Web-based Tool for Analysesof Server Log Files in Psychological Research’, Behavior Research Methods, Instruments, &Computers 36(2), 304–311.

Rousseeuw, P. J. & Leroy, A. M. (2003), Robust Regression and Outlier Detection, Wiley &Sons, Hoboken, New Jersey.

Scheffler, H. (2003), Online-Erhebungen in der Marktforschung, in Statistisches Bundesamt(2003), pp. 31–41.

Schönemann, H. J., Griffith, L., Jaeschke, R., Goldstein, R., Stubbing, D. & Guyatt, G. H.(2003), ‘Evaluation of the Minimal Important Difference for the Feeling Thermometer andthe St. George’s Respiratory Questionnaire in Patients with Chronic Airflow Obstruction’,Journal of Clinical Epidemiology 56, 1170–1176.

Schwarz, N., Knauper, B., Hippler, H.-J., Noelle-Neumann, E. & Clark, L. (1991), ‘RatingScales: Numeric Values May Change the Meaning of Scale Labels’, Public Opinion Quarterly55(4), 570–582.

Schwarz, N., Knäuper, B., Oyserman, D. & Stich, C. (2008), The Psychology of Asking Ques-tions, in de Leeuw et al. (2008), pp. 18–34.

Schwarz, S. & Reips, U.-D. (2001), CGI versus Javascript: A Web Experiment on the ReversedHindsight Bias, in Reips & Bosnjak (2001), pp. 75–90.

184

Bibliography

Shih, T.-H. & Fan, X. (2007), ‘Response Rates and Mode Preferences in Web-Mail Mixed-ModeSurveys: A Meta-Analysis’, International Journal of Internet Science 2(1), 59–82.

Sikkel, D. & Hoogendoorn, A. (2008), Panel Surveys, in de Leeuw et al. (2008), pp. 479–499.

Sills, S. J. & Song, C. (2002), ‘Innovations in Survey Research. An Application of Web-BasedSurveys’, Social Science Computer Review 20(1), 22–30.

Smyth, J. D., Christian, L. M. & Dillman, D. A. (2008), ‘Does “yes or no” on the Telephone Meanthe Same as “check-all-that-apply” on the Web?’, Public Opinion Quarterly 72(1), 103–113.

Smyth, J. D., Dillman, D. A. & Christian, L. M. (2007), Context Effects in Internet Surveys:New Issues and Evidence, in Joinson et al. (2007).

Smyth, J. D., Dillman, D. A., Christian, L. M. & Stern, M. J. (2004), ‘How Visual GroupingInfluences Answers to Internet Surveys’, Revision of paper presented at the 2004 AnnualMeeting of the American Association for Public Opinion Research, Phoenix, AZ.

Smyth, J. D., Dillman, D. A., Christian, L. M. & Stern, M. J. (2006a), ‘Comparing Check-Alland Forced-Choice Question Formats in Web Surveys’, Public Opinion Quarterly 70(1), 66–77.

Smyth, J. D., Dillman, D. A., Christian, L. M. & Stern, M. J. (2006b), ‘Effect of Using Vi-sual Design Principles to Group Response Options in Web Surveys’, International Journal ofInternet Science 1(1), 6–16.

Spector, P. E. (1992), Summated Rating Scale Construction, Quantitative Applications in theSocial Sciences, SAGE Publications Inc., Newbury Park, California.

Statistisches Bundesamt, ed. (2003), Online-Erhebungen. 5. Wissenschaftliche Tagung der ADM,Bonn, Vol. 7, ADM, Informationszentrum Sozialwissenschaften.

Steiger, D. M. & Conroy, B. (2008), IVR: Interactive Voice Response, in de Leeuw et al. (2008),pp. 285–298.

Stern, M. J., Dillman, D. A. & Smyth, J. D. (2007), ‘Visual Design, Order Effects and Respon-dent Characteristics in a Self-Administered Survey’, Survey Research Methods 1(3), 121–138.

St.Laurent, A. M. (2004), Unterstanding Open Source and Free Software Licensing, O Reilly,Gravenstein Highway North, Sebastopol.

Svennsson, E. (2000), ‘Comparison of the Quality of Assessments Using Continuous and DiscreteOrdinal Rating Scales’, Biometrical Journal (4), 417–434.

Thomas M. Archer (2007), ‘Characteristics Associated with Increasing the Response Rates ofWeb-Based Surveys’, Practical Assessment, Research & Evaluation 12(12), 1–9. Online; ac-cessed 21-Mai-2009.URL: http://pareonline.net/getvn.asp?v=12&n=12

Thomas, R. K. & Couper, M. P. (2007), ‘A Comparison of Visual Analog and Graphic Rat-ing Scales’, Paper Presentation at 9th General Online Research (GOR) Conference, Leipzig,Germany.

185

Bibliography

Thomas, R. K., Klein, J. D., Benhnke, S. & Terhanian, G. (2007), ‘The Best of Intentions:Response Format Effects on Measures of Behavioral Intention’, Paper Presentation at 9th

General Online Research (GOR) Conference, Leipzig, Germany.

Toepoel, V., Das, M. & Soest, A. V. (2008), ‘Effects of Design in Web Surveys. ComparingTrained and Fresh Respondents’, Public Opinion Quarterly 72(5), 985–1007.

Tourangeau, R., Couper, M. P. & Conrad, F. (2004), ‘Spacing, Position, and Order. InterpretiveHeuristics for Visual Features of Survey Questions’, Public Opinion Quarterly 68(3), 368–393.

Tourangeau, R., Couper, M. P. & Conrad, F. (2007), ‘Color, Labels, and Interpretive Heuristicsfor Response Scales’, Public Opinion Quarterly 71(1), 91–112.

Tourangeau, R., Couper, M. P. & Steiger, D. M. (2003), ‘Humanizing Self-Administered Surveys:Experiments on Social Presence in Web and IVR Surveys’, Computers in Human Behavior19, 1–24.

Tourangeau, R., P.Couper, M. & Conrad, F. (2003), ‘The Impact of the Visible: Images, Spacing,and Other Visual Cues in Web Surveys’, Paper Presented at the WSS/FCSM Seminar on theFunding Opportunity in Survey Methodology.

Tourangeau, R., Rips, L. J. & Rasinski, K. (2000), The Psychology of Survey Response, firstedn, Cambridge University Press, New York.

Truell, A. D. (2003), ‘Use of Internet Tools for Survey Research’, Information Technology, Learn-ing, and Performance Journal 21(1), 31–37.

van Schaik, P. & Ling, J. (2007), ‘Design Parameters of Rating Scales for Web Sites’, ACMTransactions on Computer-Human Interaction 14(1), 1–35.

van Selm, M. & Jankowski, N. W. (2006), ‘Conducting Online Surveys’, Quality and Quantity(40), 435–456.

Vehovar, V., Bagatelj, Z., Manfreda, K. L. & Zaletel, M. (2002), Nonresponse in Web Surveys,in Groves et al. (2002), pp. 229–242.

von Kirschhofer-Bozenhardt, A. & Kaplitza, G. (1975), Der Fragebogen, in K. Holm, ed., ‘DieBefragung’, Francke Verlag GmbH, München, pp. 92–126.

Voogt, R. J. & Saris, W. E. (2005), ‘Mixed Mode Designs: Finding the Balance Between Non-response Bias and Mode Effects’, Journal of Official Statistics 21(3), 367–387.

Walston, J. T., Lissitz, R. W. & Rudner, L. M. (2006), ‘The Influence of Web-based Ques-tionnaire Presentation Variations on Survey Cooperation and Perceptions of Survey Quality’,Journal of Official Statistics 22(2), 271–291.

Weichbold, M. (2003), ‘Befragtenverhalten bei Touchscreen-Befragungen’, ÖsterreichischeZeitschrift für Soziologie 28(4), 71–92.

Weichbold, M. (2005), Touchscreen-Befragungen. Neue Wege in der empirischen Sozialforschung,Peter Lang Verlag, Frankfurt am Main.

Weisberg, H. F. (2005), The Total Survey Error Approach. A Guide to the new Science of SurveyResearch, University of Chicago Press, Chicago.

186

Bibliography

Welker, M. & Wenzel, O., eds (2007), Online-Forschung 2007. Grundlagen und Fallstudien,Halem Verlag, Köln.

Welker, M., Werner, A. & Scholz, J. (2005), Online-Research. Markt- und Sozialforschung mitdem Internet, dpunkt Verlag GmbH, Heidelberg.

Whitcomb, M. E. & Porter, S. R. (2004), ‘E-Mail Contacts: A Test of Complex GraphicalDesigns in Survey Research’, Social Science Computer Review 22(3), 370–376.

Witmer, D. F., Colman, R. W. & Katzman, S. L. (1999), From Paper-and-Pencil to Screen-and-Keyboard, in S. Jones, ed., ‘Doing Internet Research. Critical Issues and Methods for Examingthe Net’, SAGE Publications, Thousand Oaks, London, New Delhi, part 7, pp. 145–161.

Witte, J. C., Pargas, R. P., Mobley, C. & Hawdon, J. (2004), ‘Instrument Effects of Images inWeb Surveys’, Social Science Computer Review 22(3), 363–369.

Wolfgang Bandilla (2003), Die Internet-Gemeinde als Grundgesamtheit, in Statistisches Bunde-samt (2003), pp. 71–82.

Wood, S. N. (2006), Generalized Additive Models. An Introduction with R, Chapman & Hal-l/CRC Texts in Statistical Science Series, Boca Raton, Florida.

Yan, T. (2005), Gricean Effects on Self-Administered Surveys, PhD thesis, Faculty of the Grad-uate School of the University of Maryland, College Park.

Zunicá, R. R. & Clemente, V. A. (2007), ‘Research on Internet Use by Spanish-Speaking Userswith Blindness and Partial Sight’, Universal Access in the Information Society (1), 103–110.

187

List of Figures

4.1 Transformation with reduced extremes . . . . . . . . . . . . . . . . . . . . . . . . 21

10.1 User patterns in Web surveys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

11.1 Screenshot of a sample of a radio question . . . . . . . . . . . . . . . . . . . . . . 7611.2 Screenshot of a sample of a button question . . . . . . . . . . . . . . . . . . . . . 7711.3 Screenshot of a sample of a click-VAS question . . . . . . . . . . . . . . . . . . . 7811.4 Screenshot of a sample of a slider-VAS question . . . . . . . . . . . . . . . . . . . 7911.5 Screenshot of a sample of a text question . . . . . . . . . . . . . . . . . . . . . . . 8011.6 Screenshot of a sample of a dropdown question . . . . . . . . . . . . . . . . . . . 81

15.1 Age distribution of respondents - tourism survey . . . . . . . . . . . . . . . . . . 9415.2 Semester distribution of respondents - tourism survey . . . . . . . . . . . . . . . 9415.3 Smiley’s used for indicating mood of interviewees - tourism survey . . . . . . . . 9415.4 Age distribution of respondents - snowboard survey . . . . . . . . . . . . . . . . . 96

16.1 Comparison of results of feedback question 1 (boring=1 vs. interesting=10) -webpage survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

16.2 Comparison of results of feedback question 1 (boring=1 vs. interesting=10) -tourism survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

16.3 Line diagram comparing results of feedback question 1 (boring=1 vs. interest-ing=10) for all three surveys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

16.4 Comparison of results of feedback question 2 (sufficient=1 vs. insufficient=10number of scale points) - webpage survey . . . . . . . . . . . . . . . . . . . . . . . 101

16.5 Comparison of results of feedback question 2 (sufficient=1 vs. insufficient=10number of scale points) - tourism survey . . . . . . . . . . . . . . . . . . . . . . . 102

16.6 Line diagram comparing results of feedback question 2 (sufficient=1 vs. insuffi-cient=10 number of scale points) for all three surveys . . . . . . . . . . . . . . . . 102

16.7 Comparison of results of feedback question 3 (easy=1 to use vs. complicated=10)- webpage survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

16.8 Comparison of results of feedback question 3 (easy=1 to use vs. complicated=10)- tourism survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

16.9 Line diagram comparing results of feedback question 3 (easy=1 to use vs. com-plicated=10) for all three surveys . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

17.1 Sample density plot of question number 17 with outliers - tourism survey . . . . 10817.2 Density plot with weights of question number 17 - tourism survey . . . . . . . . . 10917.3 Boxplots comparing the response times across input controls for question 17 of

the tourism survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11017.4 Percentage of time needed for each control per questions - mean values - tourism

survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

188

List of Figures

17.5 Percentage of time needed for each control per questions - mean values - webpagesurvey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

18.1 Survival times for the tourism survey comparing input controls . . . . . . . . . . 11618.2 Survival times for the webpage survey comparing input controls . . . . . . . . . . 11818.3 Survival times for the snowboard survey comparing input controls . . . . . . . . . 119

19.1 Significant differences between input controls for each scale item - tourism survey 12419.2 Significant differences between input controls for each scale item - webpage survey 125

20.1 Comparison of the categories slider-VAS, click-VAS and others - tourism survey . 12920.2 Compare calculated categorization with linear categorization - tourism survey . . 13220.3 Compare linear categorization points of simple-VAS with 10-scale controls - tourism

survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13320.4 Compare Boxplots for all cut-points - tourism survey . . . . . . . . . . . . . . . . 134

21.1 A sample view on the questionnaire editor . . . . . . . . . . . . . . . . . . . . . . 141

22.1 Component model of the whole QSYS-system . . . . . . . . . . . . . . . . . . . . 14422.2 Question classes diagram showcase . . . . . . . . . . . . . . . . . . . . . . . . . . 14622.3 Answer class diagram showcase . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14822.5 MVC-2.x Model as taken from http://www.javaworld.com . . . . . . . . . . . . 15422.4 Export class diagram showcase . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15522.6 Base Action classes of the StruXSLT -framework . . . . . . . . . . . . . . . . . . . 15722.7 QSYS -web view class diagram showcase . . . . . . . . . . . . . . . . . . . . . . . 16222.8 QSYS -web Process class diagram showcase . . . . . . . . . . . . . . . . . . . . . 163

189


List of Tables

11.1 Different properties of all input controls . . . . . . . . . . . . . . . . . . . . . . . 8211.2 Technical preconditions and interaction mechanisms for all input controls . . . . 8211.3 Experimental variables - tourism . . . . . . . . . . . . . . . . . . . . . . . . . . . 8311.4 Experimental variables - webpage . . . . . . . . . . . . . . . . . . . . . . . . . . . 8311.5 Experimental variables - snowboard . . . . . . . . . . . . . . . . . . . . . . . . . . 84

13.1 Overall response for all 3 surveys . . . . . . . . . . . . . . . . . . . . . . . . . . . 8713.2 Input control distribution for the tourism survey . . . . . . . . . . . . . . . . . . 8713.3 Input control distribution for the webpage survey . . . . . . . . . . . . . . . . . . 8813.4 Input control distribution for the snowboard survey . . . . . . . . . . . . . . . . . 88

14.1 Use of operating systems for all three surveys . . . . . . . . . . . . . . . . . . . . 9014.2 Use of browser agents for all three surveys . . . . . . . . . . . . . . . . . . . . . . 9014.3 Screen resolutions as used by the respondents for all three surveys . . . . . . . . . 9114.4 Distribution of browser settings for all three surveys (c. = completed; n.c. = not

completed) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

15.1 Overall portion of Mood changes for all controls - tourism survey . . . . . . . . . 9515.2 Respondent’s relation to the university (multiple answers possible) - webpage survey 9515.3 Gender distribution - webpage survey . . . . . . . . . . . . . . . . . . . . . . . . . 9515.4 Age distribution - webpage survey . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

17.1 Basic parameters concerning duration - tourism survey (in seconds) . . . . . . . . 10517.2 Basic parameters concerning duration - webpage survey (in seconds) . . . . . . . 10617.3 Basic parameters concerning duration - snowboard survey (in seconds) . . . . . . 10617.4 Basic parameters concerning duration for question 17 of the tourism survey (in

seconds) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11017.5 Overview of duration comparisons for all three surveys . . . . . . . . . . . . . . . 111

18.1 Dropout questions - tourism survey (in percent for each control) . . . . . . . . . 11618.2 Dropout questions - webpage survey . . . . . . . . . . . . . . . . . . . . . . . . . 11818.3 Dropout questions - snowboard survey . . . . . . . . . . . . . . . . . . . . . . . . 12018.4 Overview of dropout for all three surveys - paired comparisons . . . . . . . . . . 12018.5 Overview of dropout for all three surveys - dropout rates . . . . . . . . . . . . . . 121

19.1 Deviations from mean per sub-question (unit: normalized (0-1) scale point) . . . 12219.2 Example of a 2x2 table for questionnaire tourism, question 9, sub question 11,

comparison of dropdown and slider-VAS, category 5 . . . . . . . . . . . . . . . . . 12319.3 Median values of ratio of use of 5 over 6 for all three surveys, compared by input

controls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

20.1 Ratios of the adjacent categories for the click-VAS control . . . . . . . . . . . . . 130

190

List of Tables

20.2 Number of significant differences when using different categorization strategies ofthe slider-VAS control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

191

Listings

22.1 A simplified example of a tree-view for files stored for one group in QSYS . . . . 151

22.2 A simplified example of an answer document . . . . . . . . . . . . . . . . . . . . . 152

22.3 Command for respondent’s overview of one questionnaire . . . . . . . . . . . . . . 152

22.4 Command for finding out the distribution of the last filled out question . . . . . . 152

22.5 Command for querying the results of an openended question . . . . . . . . . . . . 152

22.6 Command for querying the results of a closedended question . . . . . . . . . . . . 153

22.7 An example of a simple XSLT map header . . . . . . . . . . . . . . . . . . . . . . 159

22.8 An example of a simple XSLT map entry . . . . . . . . . . . . . . . . . . . . . . . 159

22.9 An example of a random XSLT map entry . . . . . . . . . . . . . . . . . . . . . . 160

22.10 Settings within qsys.properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

22.11 Settings within fsdb.properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164

22.12 An example of XDoclet metadata attributes (for the login-process) . . . . . . . . 165

192