on heterogeneity in mobile sensing applications aiming at representative...

On Heterogeneity in MobileSensing Applications Aiming atRepresentative Data Collection

Henrik BlunckAarhus UniversityAabogade 34Aarhus, 8200 [email protected]

Mikkel Baun KjærgaardAarhus UniversityAabogade 34Aarhus, 8200 [email protected]

Niels Olof BouvinAarhus UniversityAabogade 34Aarhus, 8200 [email protected]

Paul LukowiczGerman Research Center forArtificial IntelligenceTrippstadter Strasse 122Kaiserslautern, 67663 [email protected]

Tobias FrankeGerman Research Center forArtificial IntelligenceTrippstadter Strasse 122Kaiserslautern, 67663 [email protected]

Markus WustenbergAarhus UniversityAabogade 34Aarhus, 8200 [email protected]

Kaj GrœnbækAarhus UniversityAabogade 34Aarhus, 8200 [email protected]

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies are notmade or distributed for profit or commercial advantage and that copies bearthis notice and the full citation on the first page. Copyrights for componentsof this work owned by others than ACM must be honored. Abstracting withcredit is permitted. To copy otherwise, or republish, to post on servers or toredistribute to lists, requires prior specific permission and/or a fee. Requestpermissions from [email protected]’13 Adjunct, September 8–12, 2013, Zurich, Switzerland.Copyright c© 2013 ACM 978-1-4503-2215-7/13/09...$15.00.

http://dx.doi.org/10.1145/2494091.2499576

AbstractGathering representative data using mobile sensing toanswer research questions is becoming increasinglypopular, driven by growing ubiquity and sensingcapabilities of mobile devices. However, there are pitfallsalong this path, which introduce heterogeneity in thegathered data, and which are rooted in the diversity of theinvolved device platforms, hardware, software versions andparticipants. Thus, we, as a research community, need toestablish good practices and methodologies for addressingthis issue in order to help ensure that, e.g., scientificresults and policy changes based on collective, mobilesensed data are valid. In this paper, we aim to informresearchers and developers about mobile sensing dataheterogeneity and ways to combat it. We do so viadistilling a vocabulary of underlying causes, and viadescribing their effects on mobile sensing—building onexperiences from three projects within citizen science,crowd awareness and trajectory tracking.

Author KeywordsMobile Sensing, Heterogeneity, Data Collection,Representativeness

ACM Classification KeywordsH.5.m [Information interfaces and presentation (e.g.,HCI)]: Miscellaneous.

Session: PUCAA: 1st International Workshop on Pervasive Urban Crowdsensing Architecture and Applications

UbiComp’13, September 8–12, 2013, Zurich, Switzerland

1087

IntroductionThe advent of rich sensors in common smartphones hascreated new possibilities for using mobile sensing [17] ormobiscopes [4] to gather data about the world, e.g.,reflecting users’ actual rather than reported behavior.This has found use in areas as diverse as traffic monitoring[12], leisure activities [9] and air pollution control [22], aswell as in a rich and growing set of social networkingapplications [18]. Concomitantly, the rise of “Big Data”has kindled the hope that we may have the tools toanalyse large and rich data sets. However, the quality ofany analysis rests on the quality of the data beinganalysed, and here a number of compounding factors mayadversely affect validity.

A data collection campaign using mobile sensing is basedon questions selected by involved stakeholders. This couldinclude questions such as: When and where is trafficcongested?, or What is the level of crowdedness inindividual areas during a major event?. Most questionswill refer to a target group thereby pinpointing the peoplethat could provide the most informative data about thequestions, e.g., car drivers or event participants. Tofacilitate the collection via a mobile sensing app, campaignstakeholders have to make a number of decisions with thetarget group in mind with regards to promotion, userbenefits, interface design, technical choices anddevelopment process. The choices made will affect thequality of data in the end. For instance, promoting amobile sensing app as the official app of a public eventmay increase uptake, designing an app in a style thatappeals to the target group may positively impact usage,enabling the app on the most popular device platformsmay boost downloads or a participatory design approachmay align the app better with users’ preferences andwishes. Cramer et al. [8] have previously motivated some

of these issues from a perspective of doing ubicompresearch where as we in this paper in a more structuredmanner focus on the quality of the resulting data.

An important concern of data collection is data quality.Collected empirical data is often incomplete, sometimesinaccurate, and generally prone to biases, e.g., due to thecollection methods or strategies. For empirical datainvolving humans, further biases may arise fromsituational parameters, and thus by choices of collectionprocedure, which prescribe in which situations data iscollected. E.g., when using interviews or questionnaires,the collected data may be biased by the type, amount,and focus of the interview questions as well as by anybiases and lapses of the participants’ memory. For humansurveying, biases arise from the scheduling of the(time-limited) observations, as well as by the focus of theobserver. Mobile sensing as a collection method reducesthe influence of the latter by allowing continuouscollection of a broad spectrum of information. Still, alsomobile sensing is prone to biases: E.g., existing work hashighlighted the problems of heterogeneity in sensor qualityacross smartphone types [16] and variations in availabilityof data [4]; additionally, the biases mentioned above fortraditional surveying have to be taken into account, incase of participatory sensing, i.e. when also explicit userinput data is to be gathered; proposals to address thelatter issue include usage of the experience samplingmethod within mobile apps[7].

In this paper, we discuss the above issues as sources ofdata heterogeneity, and introduce a taxonomy ofheterogeneity types grouped into the main categories ofdevice, user and project heterogeneities identifiedthroughout three case studies of community-based mobilesensing apps. Examples of device heterogeneity include



1088

missed opportunities for participatory sensing due tosluggish UIs or failing apps due to OS updates, examplesof user heterogeneity include difficulties in getting arepresentative demographic distribution of users andexamples of project-internal heterogeneity include evolvingapp functionality, and extended scope in terms ofsupported device and deployment types. We furthermoreargue for that the identified types of heterogeneitiesincrease the complexity of the analysis and potentiallyreduce validity and overall quality of data and analysisresults. Therefore, a general goal in collective sensingmight be to take design and development decisions thatconsciously assess and effectively reduce dataheterogeneity to acceptable levels.

The rising ubiquity of smartphones does not solve thementioned problems or does it make it easier to makedevelopment decisions—more data may be collected, butthe heterogeneity and quality of it will remain a concern.Furthermore, the rise and fall of manufactures andoperating systems also adds to the problem. Thus,although we in this paper describe current issues in acontemporary context and draw in this paper on concreteexperiences made within recent years, we believe that theidentified underlying problems, as formulated herein, willpersist and that they therefore have to be addressed—byand within the mobile sensing research community, sincethis community–other than the benefiting sciences andprojects, utilizing collectively sensed data—is at theforefront of developing sensing platforms. If we, as aresearch community, can establish good practices andmethodologies for mobile sensing, we can help to ensurethat, e.g., scientific results and policy changes based oncollective, mobile sensed data are valid. With this paperwe would like to provide the basis for and promote

discussions of heterogeneity challenges in the researchcommunity.

The structure of this paper is as follows: The consideredcase studies are presented in Section 2. Afterwards ourcategorisation of types and causes of heterogeneity arediscussed in Section 3 with outset in examples from thecases. Finally, conclusions are given and future work isdiscussed in Section 5.

Case StudiesTo gather knowledge about data heterogeneity, we choseto analyse a number of cases where the authors have beendirectly involved in the development and deployment.

We consider three different cases: Socionical AppFramework (SocioAF): an event guide with crowdawareness deployed at several major events (see Figure 1)[23]; F.Oskar : a project within citizen science (see, e.g.[6, 19]), utilizing an app deployed to capture data foridentifying transportation patterns (see, e.g., [5, 11, 24])within an expanding industrial neighborhood with the goalof quantifying users’ environmental impact throughmodels of transportation behavior (see Figure 2); andEnTracked : an energy efficient tracking app deployed tocapture position and trajectory traces [15, 13]. Tohighlight differences among the cases Table 1 listsproperties of the cases including application area, deviceplatforms, development process and tools, and users andphone models in deployments.



1089

Table 1: Details for cases

Description Platforms DevelopmentProcess

Development Tool Users PhoneModels

SocioAF Crowd Awareness Android / iPhone Sequential Native 1000+ Many

F.Oskar Citizen Science Android Individual Native 50-100 Many

EnTracked Position and Trajec-tory Traces

Symbian / Android Sequential Native 10 Few

Figure 1: SocioAF app interface

Figure 2: F.Oskar app interface

Heterogeneity in Mobile SensingCollecting data from several sources is fundamental inmobile sensing. Analysis and interpretation of collecteddata is challenged by the heterogeneity of the gathereddata across sources and over time. For mobile sensing,such sources resemble usually individual users and theirindividual devices via which data is gathered. We arguefor that the research community has to build up a morecomprehensive vocabulary of types and causes ofheterogeneity to be able to document and adjust for theresulting biases and heterogeneities. The process of beingaware of, documenting and adjusting is worthwhile notonly for the mobile sensing project at hand, but also thecommunity at large to learn from experiences to improvepractices for follow-up or unrelated future projects.

In the following, we will attempt to answer the question:which kinds of heterogeneity impact the heterogeneity ofdata collected via mobile sensing apps?. We willdistinguish by types and causes of heterogeneity in mobilesensing drawing from the three cases introduced earlier toexemplify and discuss resulting effects. We will introducethree top-level heterogeneity types: i) user heterogeneitydenoting the heterogeneities in demographics andpractices of the users participating in the mobile datacollection, ii) device heterogeneity, covering theheterogeneity in properties and abilities of the actual



1090

devices used during data collection and iii) projectheterogeneities covering the heterogenities originatingover time from within mobile sensing projects themselves.

User HeterogenitiesUser heterogeneities refers to heterogeneity causes rootedin the demographics for different phone models and inconcrete practices of the users participating in the mobilecollection. We will discuss below the followingheterogeneity causes individually: i) across userdemographics ii) across app usage patterns and iii) acrossdevice usage patterns.

Device Type DemographicsWhenever data is collected with human participants in theloop, the identity of the respective users will have aninfluence on the produced data. This applies to mobilesensing, where data is gathered via the user’s mobiledevice, but also when gathering data, e.g., viaquestionnaires: A correct and complete interpretation ofthe data gathered by a user would always have to beindividualized, taken into account the individual users’characteristics, such as background, preferences, routines,attitudes, etc. While thus the ultimate goal of correctinterpretation and complete understanding of suchgathered data is not achievable, it can be approximatedvia reasoning about such characteristics, and account forresulting biases in the respective individual user’s data.

Examples from the cases examined herein provide severalangles on this issue. Within collective mobile sensing,users can be divided in groups according to the type ofmobile device they use in their daily life. Survey datasuggests, that user characteristics, such as technicalaffinity, age, preferences for leisure activities and places tovisit differ significantly across, e.g. mobile phoneplatforms or models [1]. Thus, for collective mobile

sensing projects, limiting the range of user devices impliesa smaller and less representative user group—which theneffects the collected data and may introduce usergroup-specific biases. Generally, it often applies that therange of supported devices should be as wide as possible,and at least reflect the targeted user groups in arepresentative manner: In projects, such as the SocioAFcase, users of different demographics may be attracted todifferent exhibitions or events within, e.g., a fair or largefestival. For the mentioned project, this applies to majoruser groups such as grandparents with their grandchildrenversus users without children versus younger peoplewithout children; for a correct assessing of crowdinghazards or bench-marking logistics it becomes crucial tosupport mobile devices as commonly used by all oftargeted user groups.

App UsageGetting an app onto a user’s device does not guaranteethat the user will use the app or that the user will use theapp as intended by the developers. Especially participatorysensing scenarios are prone to this type of heterogeneity.For instance, when users provide observations, e.g. asadvocated by the experience sampling method [7], someof them may report events just as intended, whereasothers may have misunderstood what types of eventsshould be reported, or have difficulties matching theperceived event into a categorization provided by the appinterface. Furthermore, app usage, and frequency thereof,may depend significantly on the individual user’s affinityand motivation towards using the app and fluctuate, e.g.,with his level of attention. In F.Oskar we experienced,that a feature designed to enable people to pause sensingand intended to allow users to protect their privacy, wasused instead to limit the sensing for the sake of saveingbattery power.



1091

Device UsageThere is a great variability in how devices are used byusers; e.g do they carry their device with them all the timeand when so in trouser pockets or in a bag [19]; do theyrecharge their phone frequently or do they let it run out ofpower from time to time [21, 10]. Users may alsocustomize their devices in manners that impact sensing,this includes enabling and disabling location orcommunication features, such as, Bluetooth or WiFi, sleepsettings, lock settings or access rights for apps. In F.Oskarwe experienced users providing negative comments anduninstalling the app because the app as a side-effectimpacted their Bluetooth settings and holes in data wasalso experienced due to phones running out of power. InEnTracked extra effort had to be made to enable robustmotion recognition across different transportation modeoptions.

Device HeterogeneitiesThe device heterogeneity type covers heterogeneity ofwhich the cause lie in the heterogeneous properties andabilities of the actual devices used during data collection.We will discuss below the following kinds of heterogeneityin turn: i) across software platforms, ii) across devicehardware, and iii) across OS, firm- and hardware evolution.

PlatformMobile user devices, such as smartphones, are nearlyalways tied to a specific platform and a respectiveoperating system, such as iOS, Android or WindowsPhone. The differences between these platforms can havesignificant effects on data collection and on the resultingdata. Technical differences relevant in mobile sensingprojects include sensor APIs, scheduling options, and datarepresentation. Furthermore, platforms differ not only inmere technical aspects, but also on basis of fundamentally

different long-term platform policies. These include: i)policies affecting app functionality and development(user-protective versus open-market policies) and ii)distribution channel policies.

Distribution policies: The difference and policies in i) andii) can be observed, e.g. by comparing Apple’s iOS, wherethe protection of the user (e.g. from battery draining orprivacy-violating apps) is emphasized, with Google’sAndroid, which instead emphasizes an open-market policy.Additionally, Apple’s concept enforces lengthy reviewprocesses. While on stores such as Google Play submittedapps are available virtually instantly, on Apple’s store ittakes on average about a week, but longer review periodshave to be considered, see [2] for related statistics.

Especially for mobile sensing projects these restrictionscan be severe: Often, the apps to distribute are developediteratively, or for separate data collection events, resultingin a large number of app and app update submissions; thiswas experienced e.g. in the case of SocioAF , where theconcept was to provide app versions tailored to specificevents. Furthermore, for participatory sensing projectswhere user actively provide feedback about the system, ahighly iterative development process creates a need forseveral releases and updates. In all such cases, if severalplatforms are supported, the stall times for awaitingreviews are determined by the platform with the longestreview process—the alternative being to accept versionheterogeneity across platforms.

Policies Restricting Functionality: Commonplace amongthe restrictions inflicted by user-protecting app storepolicies such as Apple’s are those concerning (potentiallybattery draining) background services for sensing and datacommunication—services thus, which from a mobilesensing perspective are crucial. For many mobile sensing



1092

projects, including SocioAF , Apple reviewers could not beconvinced to accept background GPS sensing and datauploading, respectively. Specifically, in the SocioAF casethe reviewers persisted that no ’tangible benefits to theuser’ were provided by background uploads of userposition data—despite the developers explanations thatusers benefits were achieved in the form of increasedawareness and user safety in emergency situations duringthe targeted public events, and although this claim wasbacked up by providing proofs of official approval from theinvolved ’common good’ stakeholders, i.e. charitableorganizations or public authorities such as the City ofLondon Police. To comply with the restrictive app storepolicy, the SocioAF app was altered to send positionupdates were sent from user devices, only if ’significant’,i.e., in case of position changes of at least 100 meters. Asa result for mobile sensing projects, homogeneity ofcollective sensing app functionality across platformsincluding iOS can often only be achieved by limitingfunctionality to what the most restrictive of the targetedplatforms allows.

Sensing API: Further heterogeneities between platforms,which are handicapping mobile sensing, result fromdifferences in APIs, foremost those for user interactionand for sensing. For sensing, differences across platforms,but also across OS versions, exist e.g. in regards tolocalization options: Most modern platforms haveintroduced by now API calls which allow to specify i)desired levels accuracy and frequency of localizationand/or ii) desired trade-offs with power consumption. Asthere are different and differently effective means forachieving such levels and trade-offs, also the APIs and theunderlying implementations differ [17].

The EnTracked system, specifically designed for this,initially for Symbian, uses adaptive duty cycling, i.e. thetime of localization events is chosen adaptive to thespecified levels or trade-offs. To port EnTracked ’sbehavior completely homogeneously to, e.g., Android oriOS proved not possible; Android offers a method forperiodic duty cycling, and as alternative a method forrequesting a one time GPS fix; the latter call, though, bydefault contacts an A-GPS server over the cellularnetwork–thus causing additional energy costs forcommunication, which are usually unnecessary whenrunning EnTracked, since the last one-time fix request,and thus the last update from the A-GPS server is usuallyonly a few minutes, or even less, old. The API in currentiOS (6.1.3. at time of writing), on the other hand, doesnot allow to specify a period for static duty cycling, butonly offers to specify desired accuracy—and it is notrevealed how, and how energy-efficiently, the accuracy isachieved. This limits the options for mobile sensingprojects, and potentially the resulting data quality andamount, severely, as experienced also, e.g., in the SocioAFproject. Similar issues exist across platforms withduty-cycling other sensors such as the accelerometer.Such duty-cycling, and using an accelerometer, e.g., as anenergy-efficient wake-up sensor during periods of userinactivity, proved doable on Symbian within EnTracked .On Android, though, such accelerometer schedulingproved inefficient in practice or alternativelyineffective—both due to the sleep scheduling behaviorinherent in all investigated Android versions, see also [20].For enabling background accelerometer sensing on iOS,app store restrictions as detailed above apply.Furthermore, the iOS API only allows accelerometersensing in the background only as a byproduct ofrequesting a localization—which in itself consumes



1093

significantly more energy costs than the mereaccelerometer sensing, see, e.g. [15].

HardwareA user’s mobile device and its capabilities and behavior ina mobile sensing context is not fully described by thesoftware platform it is tied to; there are still crucialdifferences across manufacturers, models, and hardwarerevisions. From these, we can infer for mobile sensingrelevant information such as hardware available for sensingand processing data, as well as screen sizes for adequateuser interaction. While many chipsets are more or lessstandardized across different makers of mobile phone,CPUs, network chips, antennas, and sensors arecontinually being developed for better performance,improved features, and cheaper manufacture. This makesit difficult without prior rigorous calibration to directlycompare sensing data from one phone to another, even ifthe phones are of the same model, as they may well be ofdifferent hardware or firmware revisions. This problemquickly becomes intractable when many manufacturersproduce many different models within the same platform,as is the case with Android, Windows and the upcomingTizen [3] phones; the problem is then further enhanced assome manufacturers provide with their phones ownvariants of a platform’s OS and UI for the sake of productdifferentiation.

Performance: Mobile phones have in recent years seen animpressive rise in processing power, as dual or even quadcore mobile phones are now commonplace. The ability topreprocess collected data on the mobile devices have thussimilarly increased. However, this goes only for the latestgeneration of phones—older devices are still in use andshould thus be included in the testing for a mobile sensingproject, as they may not be able to sufficiently match the

desired performance. In the two user-interactive casescovered F.Oskar and SocioAF ,respective experiences ofvarying performance were made with a variety of UIelements, prominently with those involving map views.

Sensing capabilities: Also in regards to sensingcapabilities, the evolution of user devices is rapid. Moreand more sensor types become common-place in, e.g.,smartphones and thus offer new types of mobile sensingdata. Secondly, advances in more energy-efficient sensinghave been made—not only in hardware design, but also inutilization of the hardware within the devices’ OS’es, seealso the following subsection for more details. Thirdly,these new kinds of data change how higher-level contextcan be inferred; e.g. in cases such as the F.Oskar app,code for inferring transportation mode automatically andmovement type for EnTracked had to be writtenindividually and custom-tailored for phone types withdifferent sensor type sets—or even for individual phonemodels, if measures such as inference accuracy andbattery life are to be optimized. Fourthly, it is noteworthy,that due to the pressure to reduce cost, form factor andweight, newer devices may have actually worsesub-devices, e.g. GPS antennas, and in effect worselocalization accuracy than their predecessor models whichwas experienced for EnTracked . Overall, to supporthardware of differing sensing properties and capabilitiesresults often in having to deal with heterogeneity inregards to accuracy, sensing data types, and in thealgorithm design and implementation for inferringhigh-level context data types.

OS and Device VersioningWithin a specific platform, there is still heterogeneity,most pronounced on the Android platform. Somemanufacturers and telcos support long term



1094

upgrade-ability of their devices, whereas others do not (oronly support it for a subset of their models). This meansthat a service may not be available on all phones, or thata service may behave differently across different operatingsystem revisions. Even if both manufacturer and telcosupport upgrading the operating system on a particularphone, it is by no means given that the owner is inclined,aware, or capable of performing an upgrade. Updating ofapps also differs between platforms. On e.g. iOS, usershave to manually initiate an update, and likewise on olderversions of Android (or rather, the Google Play app, whichhandles app updates through their app store). Newerversions of Android give the user an option toautomatically keep apps updated, which relieves the userof the burden to do so. These update policies have to bekept in mind when deploying, as an arbitrary combinationof app versions may be “in the wild” at any given time.Similar applies to OS version updates, which during datacollection periods may result in data heterogeneity,prominently in the form of data outages, if the updateseither contain bugs, or if the mobile sensing project’scode, running on the updated OS, leads to unexpectedbehavior. This was experience for the SocioAF app wherean iOS update changed the behavior of an iOS modulemaking the app starting fail.

For mobile sensing projects, crucial differences acrossversions are, as for across platforms, within sensing anduser interaction. For user interaction, metaphors such asdrag’n’drop functionality, multi-touch gestures andrespective functionality such as panning, zooming, etc.are only incrementally added to the OS versions ofplatforms. Specifically, for the case F.Oskar drag’n’dropwas considered very helpful, but was not supported viaAPI calls in Android versions earlier than 3.0—and thusnot by a large share of the user devices in operation at

time of the writing. For sensing APIs, and their actualimplementation, the statements about heterogeneityacross platforms apply also across versions: Significantly,the quest for longer battery life implied thatincrementally—and additional to hardware improvementsmentioned above—more API options and lesspower-consuming implementations of these are providedon most platforms—specifically in regards to sleepbehavior, customized accuracy levels and trade-offs withpower consumption, c.f., e.g. [13, 14]. Note also, thatupdates of a phone’s firmware may change its behavior,especially in regards to sensor and communicationscheduling, significantly, although this may not berevealed in any release notes [14].

Project-internal HeterogeneitiesUnder project heterogeneities we subsume causes forheterogeneity that are rooted in the mobile sensing projectsetup itself. We will discuss below the followingheterogeneity causes individually: i) across the softwarelife cycle and ii) within and across deployments.

Software Life CycleAs generally within software projects, also mobile sensingsoftware evolves over time—as developers learn, and newrequirements are discovered or refined. It is overlyoptimistic to assume that one software version will sufficeduring even a single data collection campaign. Bugs andinadequacies will be discovered, and may be important tofix. Updates such as fixes lead to heterogeneity of thecollected data over time. This heterogeneity is evenhigher, since a heterogeneous penetration of pushedupgrades and bug-fix releases during or between datacollection phases is to be taken into consideration: Pushesmay be preceded by app store review processes of a prioriunknown, but significant (and across platforms varying)



1095

length; furthermore, users may install software updateseither not all, automatically, or manually at a user-chosenpoint in time. As data is collected over time, and sincedevelopers may refine collection methods, make use ofhitherto unused sensors, user input elements, or changedata representations to better suit the task, This requiresattention, so that old data is not abandoned, but kept,possibly separately, along with later revisions, and that formerging data from different revisions adjustments mayhave to be undertaken. For illustration the range offrequency of changes to and updates of mobile sensingsoftware updates, we provide records from the threeinvestigated cases. For SocioAF , flawless operation wasonly achieved after two problematic public events andafter significant bug fixes; these fix issues prominentlywith data communication and with slow and ill-displayedUI elements on a variety of devices from the diverse andhighly fragmented Android platform. Both similar UIissues on specific phones, and data communication issuesled to updates in the F.Oskar case, fixing battery drainingissues: one fixing cases of unintended Bluetooth use, andone fixing cases of unintended repetitive data uploads thatalso may lead to high data plan costs for the user; thelatter fix also introduces an efficient compression of dataon the phone before uploading it.

Updating an app during its life cycle may also becomeadvisable, when developers e.g. refine collection methods,make use of hitherto unused sensors, user input elements,or change data representations to better suit the task. Forinstance, in the EnTracked case, a number of updateswere issued, not to fix bugs, but to extend and improvefunctionality, specifically to reflect and make use of novelproperties of each next smartphone generation, such asnew types of sensors, additional sensing API elements orsmart battery interfaces [13, 14].

Additionally, in the SocioAF case, the range of supportedplatforms and devices was increased after experiences withthe app on just one platform in initial major sensingevents had been gathered. As has been illustrated here,the support of additional platforms may suggest or evenimply updates of the software on the already supportedplatforms, e.g. in order to mitigate device heterogeneity.

Development PracticesDevelopment for different platforms brings with it the riskof heterogeneity of software code, as well as of thedevelopers’ testing and debugging practices. Suchhetereogeneity is to a large extent rooted in thedifferences of platform versions in regards to programminglanguage, APIs used and potentially also inplatform-specific implemented functionality. Nonetheless,also the evolution of the mobile sensing project willcontribute to heterogeneity, since milestones, featureadditions and especially testing campaigns may diverge forthe platform-specific software variants—and will thuscontribute to the challenge of fighting the resulting dataheterogeneity across platform variants. The issues hasbeen observed and hindered project development andsensing campaigning in the SocioAF case.

ConclusionsIn this paper, we discussed data heterogeneity withinmobile sensing, and introduced a taxonomy ofheterogeneity types grouped into the main categories ofdevice, user and project heterogeneities as listed in Table2 identified throughout three case studies ofcommunity-based mobile sensing apps. If we, as aresearch community, can establish good practices andmethodologies for handling heterogeneities within mobilesensing, we can help to ensure that, e.g., scientific resultsand policy changes based on collective, mobile sensed



1096

data are valid. With this paper we aimed at providing abasis for and promote discussions regarding heterogeneitychallenges for mobile sensing in the research community.

As a natural next step for future work in this direction weenvison a discussion and accessment of the choices andoptions for developers within mobile sensing, and howsuch choices may impact and mitigate different types ofdata heterogeneity. Prominent among these choices arewhich platforms to support, and whether to as far aspossible adhere and exploit what is provided on individualplatforms, i.e. UI design, sensing APIs, and app storedistribution channels, or whether to instead aim for amaximum of app homogeneity across platforms, in termsof, e.g., appearance, user interaction, and distribution andsensing procedures. To this choices are related alsodecisions about whether to aim for a heavy or instead thinclient in terms of data processing; whether to develop asmobile websites instead of native apps; or whether todevelop app solutions sequentially for individual platformto be supported, or in parallel, either natively or usingcross-platform development tools.

Table 2: Identified categorization of heterogeneity types

User Heterogeneities- Device Type Demographics- App Usage- Device Usage

Device Heterogeneities- Platform (Incl. Distribution policies, Policies Restricting

Functionality, Sensing API)- Hardware (Incl. Performance and Sensing Capabilities)- OS and Device Versioning

Project-internal Heterogeneities- Software Life Cycle- Development Processes

References[1] http://blog.hunch.com/?p=51781, Mar. 2013.[2] http://reviewtimes.shinydevelopment.com/, Mar.

2013.[3] http://www.tizen.org/, Mar. 2013.[4] T. F. Abdelzaher, Y. Anokwa, P. Boda, J. Burke,

D. Estrin, L. J. Guibas, A. Kansal, S. Madden, andJ. Reich. Mobiscopes for human spaces. IEEEPervasive Computing, 6(2):20–29, 2007.

[5] H. Blunck, N. O. Bouvin, J. Mose Entwistle,K. Grønbæk, M. B. Kjærgaard, M. Nielsen,M. Graves Petersen, M. K. Rasmussen, andM. Wustenberg. Computational environmentalethnography: combining collective sensing andethnographic inquiries to advance means for reducingenvironmental footprints. In Proc. fourthinternational conference on Future energy systems(e-Energy 2013), pages 87–98. ACM, 2013.

[6] J. P. Cohn. Citizen science: Can volunteers do realresearch? BioScience, 58(3):192–197, Mar. 2008.

[7] S. Consolvo and M. Walker. Using the experiencesampling method to evaluate ubicomp applications.Pervasive Computing, IEEE, 2(2):24–31, 2003.

[8] H. S. M. Cramer, M. Rost, N. Belloni, F. Bentley,and D. Chincholle. Research in the large. using appstores, markets, and other wide distribution channelsin ubicomp research. In UbiComp (Adjunct Papers),pages 511–514, 2010.

[9] S. B. Eisenman, E. Miluzzo, N. D. Lane, R. A.Peterson, G.-S. Ahn, and A. T. Campbell. Thebikenet mobile sensing system for cyclist experiencemapping. In In SenSys, pages 87–101. ACM, 2007.

[10] D. Ferreira, A. Dey, and V. Kostakos. Understandinghuman-smartphone concerns: a study of battery life.Pervasive Computing, pages 19–33, 2011.



1097

[11] J. Froehlich, T. Dillahunt, P. Klasnja, J. Mankoff,S. Consolvo, B. Harrison, and J. A. Landay.Ubigreen: investigating a mobile tool for trackingand supporting green transportation habits. In Proc.27th international conference on Human factors incomputing systems, pages 1043–1052. ACM, 2009.

[12] R. K. Ganti, N. Pham, H. Ahmadi, S. Nangia, andT. F. Abdelzaher. Greengps: a participatory sensingfuel-efficient maps application. In In MobiSys, pages151–164, 2010.

[13] M. B. Kjærgaard, S. B. Bhattacharya, H. Blunck,and P. Nurmi. Energy-efficient trajectory tracking formobile devices. In Proc. 9th International Conferenceon Mobile Systems, Applications, and Services(MobiSys2011), pages 307–320. ACM Press, 2011.

[14] M. B. Kjærgaard and H. Blunck. Unsupervised powerprofiling for mobile devices. In Proc. 8thInternational ICST Conferene on Mobile andUbiquitous Systems (Mobiquitous ’11), 2011.

[15] M. B. Kjærgaard, J. Langdal, T. Godsk, andT. Toftkjær. Entracked: energy-efficient robustposition tracking for mobile devices. In Proc. 7thinternational conference on Mobile systems,applications, and services, page 221–234, 2009.

[16] N. D. Lane, H. Lu, S. B. Eisenman, and A. T.Campbell. Cooperative techniques supportingsensor-based people-centric inferencing. In InPervasive, pages 75–92, 2008.

[17] N. D. Lane, E. Miluzzo, H. Lu, D. Peebles,T. Choudhury, and A. T. Campbell. A survey ofmobile phone sensing. IEEE CommunicationsMagazine, 48(9):140–150, 2010.

[18] E. Miluzzo, N. D. Lane, K. Fodor, R. A. Peterson,H. Lu, M. Musolesi, S. B. Eisenman, X. Zheng, andA. T. Campbell. Sensing meets mobile socialnetworks: the design, implementation and evaluation

of the cenceme application. In The ACM Conferenceon Embedded Networked Sensor Systems, pages337–350. ACM, 2008.

[19] S. Patel, J. Kientz, G. Hayes, S. Bhat, andG. Abowd. Farther than you may think: An empiricalinvestigation of the proximity of users to their mobilephones. UbiComp 2006: Ubiquitous Computing,pages 123–140, 2006.

[20] A. Pathak, A. Jindal, Y. C. Hu, and S. P. Midkiff.What is keeping my phone awake?: characterizingand detecting no-sleep energy bugs in smartphoneapps. In Proc. 10th international conference onMobile systems, applications, and services, page267–280, 2012.

[21] A. Rahmati, A. Qian, and L. Zhong. Understandinghuman-battery interaction on mobile phones. InProc. 9th international conference on Humancomputer interaction with mobile devices andservices, pages 265–272. ACM, 2007.

[22] W. Willett, P. Aoki, N. Kumar, S. Subramanian, andA. Woodruff. Common sense community: scaffoldingmobile sensing and analysis for novice users. In InPervasive, pages 301–318. Springer-Verlag, 2010.

[23] M. Wirz, T. Franke, D. Roggen, E. Mitleton-Kelly,P. Lukowicz, and G. Troster. Inferring crowdconditions from pedestrians’ location traces forreal-time crowd monitoring during city-scale massgatherings. In Enabling Technologies: Infrastructurefor Collaborative Enterprises (WETICE), 2012 IEEE21st International Workshop on, pages 367–372.IEEE, 2012.

[24] Y. Zheng, Y. Chen, Q. Li, X. Xie, and W.-Y. Ma.Understanding transportation modes based on gpsdata for web applications. ACM Transactions on theWeb (TWEB), 4(1):1, 2010.



1098

on heterogeneity in mobile sensing applications aiming at representative...

Documents