university of edinburgh · web view, 304, 443-451. table legends table 1. demographics and pairwise...

64
Development of a smartphone application for the objective detection of attentional deficits in delirium Authors Dr. Zoë Tieges 1,2* Antaine Stíobhairt 1,3* , Katie Scott 3 , Klaudia Suchorab 3 , Alexander Weir 4 , Dr Stuart Parks 4 , Dr Susan Shenkin 1,2 , Professor Alasdair MacLullich 1,2 . Affiliations 1 Edinburgh Delirium Research Group, University of Edinburgh 2 Centre for Cognitive Ageing and Cognitive Epidemiology, University of Edinburgh 3 Department of Psychology, University of Edinburgh 4 Medical Devices Unit, NHS Greater Glasgow and Clyde *Z. Tieges and A. Stíobhairt contributed equally to this paper. Corresponding author Dr. Zoë Tieges 1

Upload: others

Post on 29-Jan-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

Development of a smartphone application for the objective detection of attentional deficits in delirium

Authors

Dr. Zoë Tieges1,2* Antaine Stíobhairt1,3*, Katie Scott3, Klaudia Suchorab3, Alexander Weir4, Dr Stuart Parks4, Dr Susan Shenkin1,2, Professor Alasdair MacLullich1,2.

Affiliations

1Edinburgh Delirium Research Group, University of Edinburgh

2Centre for Cognitive Ageing and Cognitive Epidemiology, University of Edinburgh

3Department of Psychology, University of Edinburgh

4Medical Devices Unit, NHS Greater Glasgow and Clyde

*Z. Tieges and A. Stíobhairt contributed equally to this paper.

Corresponding author

Dr. Zoë Tieges

Edinburgh Delirium Research Group

University of Edinburgh, Room S1642

Royal Infirmary of Edinburgh, 51 Little France Crescent

Edinburgh, EH16 4SA, UK

Tel. +44 (0) 131 242 6482, Fax. +44 (0)131 242 6370, E-Mail [email protected]

ABSTRACT

Background

Delirium is an acute, severe deterioration in mental functioning. Inattention is the core feature, yet there are few objective methods for assessing attentional deficits in delirium. We previously developed a novel, graded test for objectively detecting inattention in delirium, implemented on a computerised device (Edinburgh Delirium Test Box (EDTB)). Although the EDTB is effective, tests on universally available devices have potential for greater impact. Here we assessed feasibility and validity of the DelApp, a smartphone application based on the EDTB.

Methods

This was a preliminary case-control study in hospital inpatients (aged 60-96) with delirium (N=50), dementia (N=52) or no cognitive impairment (N=54) who performed the DelApp assessment, which comprises an arousal assessment followed by counting of lights presented serially. Delirium was assessed using the Confusion Assessment Method and Delirium Rating Scale-Revised-98 (DRS-R98), and cognition with conventional tests of attention (e.g. digit span) and the Short Orientation-Memory-Concentration test (OMCT).

Results

DelApp scores (maximum score=10) were lower in delirium (scores(median(IQR)): 6(4-7)) compared to dementia (10(9-10)) and control groups (10(10-10), p-values<0.001). Receiver Operating Characteristic (ROC) analyses revealed excellent accuracy of the DelApp for discriminating delirium from dementia (AUC=0.93), and delirium from controls (AUC=0.99, p-values<0.001). DelApp and DRS-R98 severity scores were moderately well correlated (Kendall's tau= -.60, p<0.001). OMCT scores did not differ between delirium and dementia.

Conclusions

The DelApp test showed good performance, supporting the utility of objectively measuring attention in delirium assessment. This study provides evidence of the feasibility of using a smartphone test for attentional assessment in hospital inpatients with possible delirium, with potential applications in research and clinical practice.

Key words

Delirium; Attention; Objective; Measurement; Neuropsychological; Smartphone; Dementia; Cognition

Running head

Detecting attentional deficits in delirium

INTRODUCTION

Delirium is an acute, severe neuropsychiatric syndrome characterized by fluctuating disturbances in attention, arousal and cognition. It is highly prevalent in older hospitalized patients and is associated with adverse outcomes including functional decline, new institutionalization, persistent cognitive impairments including dementia, and higher mortality (Siddiqi et al., 2006; Witlox et al., 2010).

The core diagnostic feature of delirium is 'inattention', defined as a 'reduced ability to direct, focus, sustain and shift attention' in the DSM-5 criteria for delirium (American Psychiatric Association, 2013). The evidence suggests that several aspects of attention are likely to be affected in delirium, including the basic orienting response, focusing, sustaining and dividing attention (Tieges et al., 2014). The extent to which each of these aspects of attention is affected in delirium is poorly understood, though deficits in sustained attention (i.e. the ability to maintain attention to stimuli over time) have been implicated (Brown et al., 2011; O'Keeffe and Gosney, 1997; Tieges et al., 2014).

A range of methods is used in research and in clinical practice to ascertain inattention. These include subjective assessments based on interview and clinical observation, and objective assessments using brief neuropsychological tests (Hall et al., 2012). Objective tests likely offer some advantages over subjective methods, including standardization of instructions, and greater reliability and reproducibility. In addition, objective tests are less likely to be operator dependent and reliant upon clinical experience than subjective assessments. Several neuropsychological tests are currently used to assess inattention in delirium. These include digit span, spatial span, months of the year and days of the week backwards, vigilance 'A' and serial 7s (O'Regan et al., 2014; Tieges et al., 2014). Generally these tests perform well in detecting delirium. However, the available data show that most existing tests discriminate poorly between delirium and other mental disorders including dementia (Morandi et al., 2012). Indeed, many studies show that patients with dementia show significant deficits in tests often used to assess inattention in delirium, such as spatial span (Meagher et al., 2010), serial 7s (Bronnick et al., 2007) and months of the year backwards (Katzman et al., 1983). Further, existing tests of attention have not been validated for severity grading in delirium, which is useful in monitoring delirium progress over time and providing a more fine-grained measure of delirium than dichotomous scoring methods.

To help address the lack of robust, objective assessments for inattention in delirium, we previously developed a novel attentional test implemented on a custom-built computerized device entitled the Edinburgh Delirium Test Box (EDTB) (Brown et al., 2011; Tieges et al., 2013). The EDTB tasks require participants to count and verbally report how many times either one or two slowly presented lights on the device illuminate. The lights are illuminated for 1 sec, and the inter-trial intervals are 1-4 sec. The design of this test was motivated by the need for a task that was cognitively simple whilst placing demands on attentional functioning. Prior research had found that the ability to sustain attention in a simple counting task may be relatively spared in Alzheimer’s disease (Lines et al., 1991; Morandi et al., 2012; Perry et al., 2000). The EDTB differs from these prior tasks in that the stimuli are visual instead of auditory, and distracting stimuli are presented in some of the trials in order to increase task difficulty. In our work with the EDTB we found that patients with delirium showed marked deficits on these tasks, whereas patients with dementia and cognitively unimpaired controls performed at or near ceiling level. The receiver operating characteristic areas under the curves were between 0.80 and 1.00, indicating good to excellent accuracy of the EDTB in discriminating delirium from dementia and control groups (Brown et al., 2011). We concluded therefore that these new objective attentional tests offered potential utility in detecting delirium and differentiating it from dementia. More broadly, validated, objective neuropsychological assessments potentially offer a more robust approach to the measurement of inattention in delirium than presently available methods.

We recently developed a software application for detecting deficits of visual sustained attention in delirium ('DelApp'), which is based on our EDTB sustained attention tasks. The rationale is that although the EDTB showed good performance, tests on devices that are universally available, such as smartphones, are more readily applicable in research and clinical settings. Smartphone-based applications are increasingly being used as a method for assessing cognitive function in older individuals (Brouillette et al., 2013).

In this preliminary study, we first conducted a feasibility study to assess acceptability of the DelApp for the assessment of attention in older hospitalised patients. We then proceeded with a pilot case-control study to compare performance on the DelApp in a selected sample of older hospitalised patients with dementia, delirium or no cognitive impairment.

METHODS

Study 1

Design

This study was designed to assess the feasibility and acceptability of the DelApp in hospitalised older patients, and to compare the EDTB and DelApp. Formal diagnostic assessment of each patient was not performed as this was not required to meet the aims of Study 1. A within-subjects design was used. Patients were studied on a single occasion and underwent EDTB and DelApp tests. Semi-structured interviews with patients were conducted to assess their opinions of these tests. Study 1 and Study 2 were approved by the Scotland A Research Ethics Committee.

Participants

Patients aged 60 and older who were able to communicate in English were eligible. All patients had to demonstrate capacity to provide written, informed consent; this was because the ethics committee ruled that for Study 1 it was unnecessary to involve patients requiring proxy consent. Exclusion criteria were: severe sensory impairments or severe acute illness that would impede testing and interview, and where clinical staff considered that participation would adversely affect the patient’s care.

The researchers (AS, KSc and KSu) recruited 20 patients from Medicine of the Elderly and orthopedic wards at the Royal Infirmary of Edinburgh, Scotland. Patients meeting eligibility criteria were identified through consultation with the clinical care team. Forty-three patients were identified, and of these nine patients declined to participate, three patients were not available for testing because they were discharged or moved by the time the researcher approached them, and eleven were deemed unsuitable by the researcher due to lack of capacity (cognitive impairment, reduced arousal) or severity of illness. This recruitment process yielded the final target sample size of 20 patients.

Measurement and Procedures

Participants first undertook a brief visual acuity test to ensure that they could perceive the test stimuli. The visual acuity test comprised six short trials in which participants were asked to (1) identify a change in the colour of a white circle (5cm diameter) presented on the smartphone screen (white-to-grey and grey-to-white); (2) identify a change in the shape of the stimulus (circle-to-star and star-to-circle); and (3) name two letters presented on the screen one at a time (B and E) (Figure 1A). Testing proceeded only when the participants could reliably perceive the stimuli.

Patients then completed visual sustained attention counting tasks on the EDTB and DelApp (www.edinburghdelirium.ed.ac.uk) in counterbalanced order (see figure S1 published as supplementary material online attached to the electronic version of this paper at http://journals.cambridge.org/ipg). The EDTB Mark 2 is a purpose-built cuboidal, grey plastic box (13 x 21.5 x 7.5 cm) with two protruding circular illuminable buttons (5 cm diameter), presented to participants in landscape orientation. Each button contains concealed (when not illuminated) light-emitting diodes (LEDs) and is surrounded by four LEDs concealed under a translucent white cover. The box further contains a concealed central 7 x 7 matrix of LEDs to display distracting stimuli such as checkered patterns (Tieges et al., 2013).

The DelApp task was presented on the 9.2 x 5.9 cm display of a Samsung Galaxy S2 running the Android 3.2 operating system. The brightness was set automatically based on the ambient light. The target stimulus was a single large white circular illumination (5 cm diameter) presented in the centre of the screen against a black background. On some trials, this was surrounded by pseudo-random subsets of eight small downward-pointing triangles (0.3 x 0.3 x 0.4 cm), each presented for 500 ms, to provide distraction (Figure 1B). The smartphone was held in portrait orientation at a distance of approximately 50 cm from participants.

All EDTB tasks involved presentation of one light only, instead of the two lights used in the original EDTB studies, so as to more closely resemble the DelApp and thus provide a more direct comparison. Participants were asked to count and then verbally report at the conclusion of the sequence (as indicated by the researcher) the number of times they saw the illuminations (i.e. illuminations of the button on the EDTB or circles presented on the smartphone screen).

Both the EDTB and DelApp counting tasks comprised seven trials of increasing difficulty, presented in a fixed order. This served to minimise floor and ceiling effects. The illuminations lasted 1000 ms, and distracting stimuli lasted 500 ms each. The number of illuminations increased from 2 to 8 across trials. The inter-stimulus interval of Level 1 (trials 1-3) ranged between 1000 ms and 2000 ms. Distracting stimuli were added for Level 2 (trials 4-5) and approximately twice as many distracting stimuli were shown in Level 3 (trials 6-7). Trial one was a practice trial, which was repeated a number of times if necessary. Trials were scored as correct or incorrect. If no answer was given, this was scored as incorrect. Thus, we did not distinguish between incorrect responses and omissions. {Association, 2014 #173}Likewise, patients who could attempt only a subset of the trials still received a score (i.e. total number of correct completed trials) and were included in the statistical analyses.

The end of each trial in the DelApp task was signalled to the researcher by a short vibration generated by the phone.

The DelApp counting task was preceded by a brief assessment of level of arousal (LoA). This was included because some patients with abnormal LoA cannot engage with cognitive testing, and these patients are considered as showing severe inattention (see the guidance notes in the 2013 DSM-5 diagnostic criteria for delirium (American Psychiatric Association, 2013)). To allow for some grading of the DelApp test scores in patients unable to perform the attention task, the arousal assessment comprised the following three items: (1) Can the patient keep his/her eyes open for 10 sec? (2) Can the patient say his/her name? (3) Can the patient track an object (e.g. badge, phone) for 5 sec? The arousal assessment was combined with the seven-item counting task to provide an overall score out of 10. The full DelApp assessment generally took less than five minutes to complete.

Following the EDTB and DelApp tests, participants underwent a semi-structured interview in which they were asked a series of scripted open-ended questions relating to their experience of the DelApp in comparison to the EDTB. Interviews were recorded and transcribed. The whole testing and interview assessment typically lasted around 15 minutes.

Statistical Analysis

Comparisons between test scores on the DelApp and EDTB tasks were analysed using a Wilcoxon signed-rank test. The threshold for statistical significance was p ≤ .05. A qualitative thematic analysis was conducted on verbatim transcriptions of recorded interviews (Braun and Clarke, 2006). All quantitative analyses were carried out in R version 3.0.1 (R Core Team, 2014).

Study 2

Design

This study employed a between-subjects design, comparing performance on the DelApp test in three groups: patients with delirium, dementia, and no severe cognitive impairment. This was a case-control study and so we deliberately aimed to have groups that were clear-cut clinically.

Participants

Inclusion and exclusion criteria were as for Study 1. Proxy consent was sought for patients lacking capacity to provide consent for themselves. Patients were recruited from Medicine of the Elderly and orthopedic wards at the Royal Infirmary of Edinburgh, and acute and rehabilitation wards in Liberton Hospital, Edinburgh.

A total of 269 patients were initially identified by the clinical staff and researchers as potentially appropriate for the study. Of these, 27 patients declined, two patients were not available for testing because they had been discharged or moved, and two patients were participating in a similar study and were therefore excluded. Thirty-six patients were not suitable, due to severe cognitive impairments, severely reduced level of arousal such that no engagement was possible, severity of illness or distress. Twenty-five patients who required proxy consent or asked researchers to come again at a different time could not be approached again due to researcher unavailability. Proxy consent was sought but could not be obtained for one patient. A sample of 181 patients underwent assessment in the study. However, 25 patients were excluded after assessment because they could not be allocated to any of the pre-defined study groups. The final study sample size was 156 (delirium: N = 50; dementia: N = 52; control: N = 54).

Measurement and Procedures

Cognition and delirium status were assessed on one occasion. All cognitive and delirium assessments were conducted by graduate psychologists (AS, KSc and KSu) who had been fully trained by a geriatrician (AM) and a postdoctoral psychologist (ZT). This training process included: regular supervisor meetings; ward round observations; role play and mock assessments; a supervisor-led teaching session on extracting information from case notes to inform categorisation; and a certified course on Good Clinical Practice. Students followed detailed Standard Operating Procedures for the delirium and cognitive assessments and they were closely monitored throughout the recruitment and data acquisition stage by their supervisors (AM and ZT). All tests were administered at the patient’s bedside, with the curtain surrounding their bed closed.

The presence of delirium was assessed using the CAM (Inouye et al., 1990). Patients who met CAM criteria for delirium were included in the delirium group irrespective of their current or prior level of cognition. Severity was measured using the Delirium Rating Scale-Revised-98 (DRS-R98 (Trzepacz et al., 2001). Scoring of the DRS-R98 was based on the preceding 24 hours. Higher scores indicate greater severity and amount to a total score of 39 (severity subscale). Discussions with the clinical staff and scrutiny of the case notes provided additional diagnostic and severity information.

Patients in the dementia group either had a prior formal diagnosis, or clearly met DSM-IV criteria for dementia (using information from case notes and informants) as determined by a consultant geriatrician. Patients with dementia did not have delirium at the time of assessment. Patients were included in the control group only if they demonstrated normal cognition on a general cognitive measure described below, and had no documented history of chronic cognitive or functional impairment.

The two letter trials of the visual acuity assessment were discontinued as they were considered redundant and related to a separate task involving letter sequences. The remaining four trials and DelApp procedures were the same as for Study 1 These were accompanied by additional measures of attention, arousal and cognition, as follows.

The short Orientation Memory Concentration Test (OMCT (Katzman et al., 1983)) was used to assess the overall level of cognition. This test is a validated measure of cognitive impairment (Katzman et al., 1983). To facilitate analyses we scored the OMCT such that higher scores indicate better performance. The suggested scores for categorisation are: 24-28 = normal cognition; 19-23 = questionable impairment, ≤18 = dementia (Morris et al., 1989). A set of brief attention tests (Marcantonio, 2008) established as a method of assessing inattention in delirium and here termed the Brief Attention Tests (BAT) was used to provide a standardised measure of inattention. The BAT comprised: digit span forwards (3-5 digits), digit span backward (3-4 digits), and days of the week and months of the year in reverse order. Each correct item equated to one point and scores of 5 (out of 7) or below were considered indicative of inattention. The BAT informed scoring of the inattention item on the CAM and served as a comparison for the DelApp.

The Observational Scale of Level of Arousal (OSLA; www.edinburghdelirium.ed.ac.uk) (Tieges et al., 2013) was included to provide a quantitative measure of LoA and to aid assessment of delirium. It was developed in-house and designed specifically to characterize abnormalities in LoA in patients with delirium. The OSLA comprises four graded items: eye opening, eye contact, posture, and movement. Higher scores indicate greater abnormality in LoA (maximum score is 15).

The order of attention assessments (DelApp, BAT) and cognitive assessment (OMCT) was systematically counterbalanced across patients using a Latin squares design. The DRS-R98, OSLA and CAM were completed following the assessments. All assessments including DelApp and reference diagnosis for delirium were performed by single raters.

Following the assessment of delirium and cognition (including the DelApp), some patients showed patterns of results on the cognitive tests that suggested the initial group assignment (based on case notes and discussions with ward staff) was incorrect. For example, patients who were initially deemed as potential controls by ward staff sometimes had cognitive impairment as determined by the more detailed cognitive testing performed as part of the present study. For such patients the most appropriate grouping was decided blind to DelApp results by a consultant geriatrician (AM) based on all the other available information (i.e. cognitive test scores, information taken from medical notes and conversations with ward staff). Some patients could not be allocated to any group, because their test scores deviated considerably from group medians and/or they did not present a symptom profile that was clearly characteristic of any one group. These patients were excluded, again without knowledge of the DelApp scores, since for the purposes of this study we aimed to select patients who could be classified as being part of one of the predefined clinical groups.

Statistical Analysis

Non-parametric statistical tests were used as the majority of the data was non-normally distributed and there was hetereogeneity of variance across groups. Kruskal-Wallis and Mann-Whitney U tests were used for between-group analyses. Continuity corrections were applied to Mann-Whitney U tests. Holm corrections were applied to multiple comparisons. For these comparisons the 95% Confidence Interval (CI) limits are expressed as differences in mean rank between each pair of groups. Kendall’s Tau was used for correlations as there were many ties in the data (Field et al., 2012). Pearson’s chi-squared tests were used for proportional data. Linear regressions were carried out using a conservative threshold for significance of p ≤ .01, due to non-normality. Receiver operating characteristic (ROC) curves were compared using DeLong's method (DeLong et al., 1988). Chi-square tests were used to explore differences in performance between subsequent levels of task difficulty. Odds ratios are also reported.

Medium to large effect sizes were found in a similar study (Brown et al., 2011). Therefore, taking a relatively conservative estimate of effect size as 0.35, where p = 0.05 and N = 141, all statistical tests used in the present analysis will have >95% power. This calculation defined the minimum acceptable sample size.

RESULTS

Study 1

Twenty patients, aged 66-93, were recruited. No significant difference was found between scores on the DelApp (median = 10, range 8-10) and EDTB (median = 10, range = 8-10; p = .41, r = -.19). Qualitative thematic analysis indicated that responses were centred around four themes: Physical presentation of the DelApp; ease of DelApp task; further development of the application; and device preference (EDTB or DelApp). All patients stated that they had no difficulty perceiving the stimuli, or completing the tests on the DelApp. Three participants mentioned that the peripheral distractors of the DelApp were not as distracting as those on the EDTB. When asked which device they preferred, nine out of 20 patients had no preference, seven preferred the DelApp and four preferred the EDTB.

Study 2

Patient characteristics

Age differed significantly among groups (p < .001; Table 1). Delirium and dementia groups were overall older than controls (delirium vs control: p < .001, 95% CI = [13.0, 52.27]; dementia vs. control: p < .001, 95% CI = [23.93, 62.81]). There was no difference in age between dementia and delirium groups.

General cognition, as measured with the OMCT, differed between groups (Table 1). Scores in the delirium and dementia groups were significantly lower compared to controls (delirium vs. control: p < .001, 95% CI = [64.78, 91.04]; dementia vs. control: p < .001, 95% CI = [58.59, 84.58]). There was no difference between dementia and delirium groups.

Between-group differences in attention

DelApp: Of the 156 patients included in the study, ten participants (eight from the delirium group and two from the dementia group) were unable to provide answers on some or all trials on the counting task. Where this occurred trials were scored as incorrect, as this was considered the result of severe inattention, and the overall scores for these patients were included in the statistical analyses. DelApp scores differed among groups (p < .001; Table 1 and Figure 2). Patients with delirium scored significantly lower than patients with dementia (p < .001, 95% CI = [50.58, 74.38]) and controls (p < .001, 95% CI = [69.25, 92.84]). The dementia group scored significantly lower than controls (p < .001, 95% CI = [6.90, 30.25]). Age was not a significant predictor of DelApp score (β = -0.05, t (154) = -2.33, p = .021, R2 = .034, 95% CI = [8.88, 15.86]).

BAT: BAT scores also differed between groups (p < .001; Table 1). The delirium and dementia groups scored lower on the BAT than controls (delirium vs. control: p < .001, 95% CI = [51.43, 83.86]; dementia vs. control: p < .001, 95% CI = [20.85, 52.96]). Scores were also lower in delirium compared to dementia (p < .001, 95% CI = [14.37, 47.0]).

DelApp and abnormal LoA

OSLA scores reflecting LoA differed between groups (p < .001; Table 1). As expected, the delirium group scored higher than patients with dementia (p < .001, 95% CI = [60.2, 78.4]) and controls (p < .001, 95% CI = [66.67, 84.7]), indicating greater abnormality in LoA in patients with delirium, while there was no difference between dementia and control groups. There was a moderate negative correlation between OSLA and DelApp scores (τ = -.61, p < .001), indicating that a more abnormal LoA was associated with worse inattention. A total of seven patients, all from the delirium group, scored less than 3 on the DelApp arousal sub-scale delirium group.

ROC analysis

ROC analysis was performed on the DelApp with delirium diagnosis as a reference. The area under the curve (AUC) was 0.96 (p < .001, 95% CI = [0.93, 0.995]), with 98% sensitivity and 93% specificity for detecting delirium in the study sample as a whole using a cut-off of 8 (out of a maximum possible score of 10). The seven CAM-negative participants who scored at or below this cut-off were all from the dementia group and had severe cognitive impairments (median OMCT score = 5, range = 0-9). Analysis based on the delirium and control groups alone revealed an AUC of 0.99 (p < .001, CI = [0.97, 1]), with 98% sensitivity and 100% specificity for detecting delirium. Analysis based on the delirium and dementia groups alone returned an AUC of 0.93 (p < .001, CI = [0.88, 0.99]), with 98% sensitivity and 87% specificity using the same cut-off of 8.

By comparison, the AUC of the BAT for detecting delirium in the entire study sample using a cut-off of 5 (out of 7) was 0.82 (p < .001, 95% CI = [0.75, 0.89]), with 76% sensitivity and 72% specificity. Direct comparison of the discriminative ability of these tests found that the AUC was significantly higher for the DelApp compared to the BAT (z = 3.8, p < .001, r = .37).

The DelApp as a measure of severity of attentional deficits

There was a moderate negative correlation between DelApp and DRS-R98 severity scores (τ = -.60, p < .001). A 1-point decrease in DelApp score reflecting more inattention was associated with a 4.8-point increase in DRS-R98 severity score (β = -0.208, t (154) = -15.23, p < .001, R2 = .62, CI = [9.66, 10.3]; Figure 3).

The proportion of correct responses on the DelApp test given by patients in the delirium group decreased as the level of difficulty increased. Based on odds ratios, the odds of giving the correct response was 2.2 times higher for level 1 compared to level 2 (p = .003, 95% CI = [1.26, 3.86]). Similarly, the odds of giving the correct response was 1.33 times higher for level 2 compared to level 3, though this was not statistically significant. No effects of task level were found for the dementia group or controls. Frequencies of responses are presented in Table 2.

DISCUSSION

This preliminary study showed initial feasibility and validity of a new smartphone-based test of attention for use in delirium assessment. The DelApp was effective in objectively measuring inattention associated with delirium in a selected sample of older hospitalised patients. The DelApp was 98% sensitive and 93% specific to delirium in the sample as a whole, which is similar to the diagnostic performance of the original EDTB tasks (Brown et al., 2011). Furthermore, patients with delirium performed poorly on this test compared to cognitively normal patients and patients with dementia, even though delirium and dementia groups did not differ with respect to overall cognitive impairment as assessed by a standard test. This is important because symptom profiles in delirium and dementia commonly overlap, yet these disorders have radically different etiologies and treatments.

Patients with delirium performed more poorly on both the DelApp and BAT, which includes digit span, than patients with dementia. However, there was substantial overlap in BAT scores between delirium and dementia groups. Direct comparison of the discriminative ability of DelApp and BAT tests indicated that the DelApp was more effective in detecting delirium and discriminating delirium from dementia. This is not surprising because digit span is not only employed as a test of attention in delirium, but also as a measure of IQ, working memory, and executive functioning (Buchanan et al., 2010; Greneche et al., 2011; Iverson, 2001). Moreover, studies have shown that patients with dementia perform poorly on the digit span backward (Meagher et al., 2010) and that the digit span forward does not distinguish between delirium and dementia (O'Keeffe and Gosney, 1997). In contrast, the DelApp attention assessment appears to provide a purer measure of attention, placing low demands on other aspects of cognition.

The broad range of DelApp scores in the delirium group supports the notion that this test potentially provides a graded measure of the severity of inattention in delirium. Further, scores on the DelApp and the DRS-R98 were moderately well correlated. Thus, the DelApp may help with severity grading of delirium. However, linear regression revealed large residuals in the delirium group, which may be explained by the range of symptoms and different time frames covered by each test. Specifically, the DelApp assesses current level of arousal and inattention, whereas the DRS-R98 accounts for a broad range of delirium symptoms occurring in the preceding 24 hours.

The DelApp test appears to have a number of strengths including its simplicity, ease-of-use, objectivity, and portability. Controlled presentation of stimuli and automated scoring make the DelApp less prone to experimenter bias or error, and it also requires little training and does not strongly rely on clinical judgment. As such, the DelApp may be of particular value when used by non-experts (Kean and Ryan, 2008). Our preliminary results suggest that the DelApp could provide a valid measure of delirium severity and as such may be useful in tracking change in delirium presence and severity over time.

Although abnormal LoA as measured with the OSLA was found in delirium, in line with previous findings (Tieges et al., 2013), only seven patients (all with delirium) had abnormal scores on the arousal assessment of the DelApp. In the present study, this subscale therefore made a relatively small contribution to the overall diagnostic performance of the DelApp. This finding may indicate that the DelApp arousal assessment was insensitive to subtle abnormalities in LoA. However, spectrum bias may have played a role here, because ward staff generally did not recommend patients who were asleep or very drowsy, or extremely agitated, and so these patients were mostly not approached for recruitment. Importantly, in clinical practice 10% of all patients may have abnormalities in LoA as detected by routine assessments (Prytherch et al., 2010). Therefore, the utility of the DelApp arousal subscale needs to be addressed in future studies with consecutive, unselected patients.

This study has several limitations that must be acknowledged. The DelApp assessment and reference diagnosis of delirium were both carried out by single raters. Although the raters did not consider the DelApp assessment when completing the CAM and DRS-R98 (which always followed the DelApp assessment), it cannot be stated with certainty if DelApp scores may have influenced delirium diagnosis to some extent or not. This may have resulted in incorporation bias. It should be noted however that a diagnosis of delirium and/or dementia was ascertained through communication with the clinical team responsible for the care of a patient and by examination of the case notes. Further, the clinical profile of most patients, including information from case notes and neuropsychological tests (blind to the DelApp) was discussed with an experienced geriatrician to validate the diagnosis clinically. Another study limitation is the use of a selected patient sample, which may affect the generalizability of the present findings. The DelApp requires further validation by confirming diagnostic test accuracy and potential for clinical applicability in a representative population of consecutive, unselected patients with blinded raters who perform the reference standard and index tests independently. Assessment of global cognition was done using the OMCT (Katzman et al., 1983). Whilst this measure is validated and has the advantages of brevity and ease-of-use, it does not provide a fine-grained measure of cognitive impairment. Future studies should consider using a more comprehensive assessment of cognitive impairment and also include a retrospective informant questionnaire such as the IQCODE (Jorm and Jacomb, 1989) in order to obtain more accurate information about the possible presence of pre-morbid cognitive impairment prior to admission. Finally, grouping of delirium and co-morbid delirium-dementia precluded an investigation of the DelApp's ability to diagnose delirium superimposed on dementia. Our reason for grouping these patients was that the majority of older patients with delirium had some degree of underlying cognitive impairment or (often undiagnosed) dementia and therefore it was hard to recruit a "pure" delirium group with the means available to us at the time. Nonetheless, it would be interesting in future studies to assess differences in DelApp performance between delirium groups with and without dementia.

Feedback from patients indicated that the distractor lights on the hardware box were experienced as more distracting than the distractor triangles in the DelApp task. Future versions of the DelApp should therefore use distracting stimuli with a modified appearance (larger size, brighter colour, and/or more frequent presentation) to make them stand out more. A related issue is that, though patients with delirium performed worse on the third level of the attention task compared to previous levels, this performance difference did not reach significance. Thus, the attentional load of task level 3 could be increased in future versions of the DelApp. In sum, further development of the DelApp is required, including optimisation of test parameters, user interface and data acquisition and display. Also, in order for the DelApp to be readily applied as a screening tool in routine clinical practice, the duration of the DelApp assessment needs to be further shortened. This could be done by reducing the number of trials, the length of counting sequences and/or the inter-stimulus delays, and also by adopting a scoring system whereby the test is terminated following a specified number of repeated errors. Further, a more detailed characterization of type and severity of dementia is required to determine if our method can distinguish moderate to severe forms of dementia from milder forms of delirium. Finally, performance on the DelApp should be compared across different subtypes of dementia, including dementia with Lewy bodies which is associated with prominent attentional deficits (Metzler-Baddeley, 2007).

Conclusions

The current study provides the first evidence for the feasibility and validity of a smartphone-based assessment of attention in older hospitalised patients with delirium, including those who are frail or acutely unwell. The findings also support and extend previous observations that patients with delirium have specific impairments of sustained attention. The DelApp shows promise as a sensitive, specific and valid tool to assist identification and severity grading of delirium in research and clinical practice. In this regard, the DelApp could potentially be integrated into current delirium assessment tools by providing a robust and validated assessment of inattention. Further development of the DelApp is now required to make it suitable as a screening tool for delirium in routine clinical practice. Finally, studies are needed to assess validity in representative, unselected patients using independent raters.

Conflicts of interest

AMacL holds patents on computerized devices and tests for measuring attention in delirium.

This study was funded by a Medical Research Council Centenary Early Career Award to Z. Tieges. Funding from the Biotechnology and Biological Sciences Research Council, the Engineering and Physical Sciences Research Council, the Economic and Social Research Council, and the Medical Research Council is gratefully acknowledged. The authors also thank the patients and staff from the Medicine of the Elderly and acute orthopedic wards of the Royal Infirmary of Edinburgh and the acute and rehabilitation wards of Liberton Hospital in Edinburgh.

Description of authors’ roles

Alasdair MacLullich (AM) and Zoë Tieges (ZT) designed the study and provided supervision. Antaine Stíobhairt (AS), Klaudia Suchorab (KSu) and Katie Scott (KSc) carried out patient recruitment, data collection and data analysis (with help of ZT), and also contributed to study design. The manuscript was drafted by AS and ZT; AM, KSc and KSu assisted with writing the manuscript. The DelApp software was developed by Alexander Weir and Stuart Parks in collaboration with ZT and AM. Susan Shenkin helped with patient recruitment, and provided input into the project plan and manuscript.

REFERENCES

Field, A. P., Miles, J. N. V. and Field, Z. C. (2012). Discovering Statistics Using R. London: SAGE Publications.

American Psychiatric Association (2013). Diagnostic and statistical manual of mental disorders (5th ed.). Arlington, VA: American Psychiatric Publishing.

Braun, V. and Clarke, V. (2006). Using thematic analysis in psychology. Qualitative Research in Psychology, 3, 77-101.

Bronnick, K., Emre, M., Lane, R., Tekin, S. and Aarsland, D. (2007). Profile of cognitive impairment in dementia associated with Parkinson's disease compared with Alzheimer's disease. Journal of Neurology, Neurosurgery and Psychiatry, 78, 1064-1068.

Brouillette, R. M., et al. (2013). Feasibility, reliability, and validity of a smartphone based application for the assessment of cognitive function in the elderly. PloS One, 8, e65925.

Brown, L. J., Fordyce, C., Zaghdani, H., Starr, J. M. and MacLullich, A. M. (2011). Detecting deficits of sustained visual attention in delirium. Journal of Neurology, Neurosurgery and Psychiatry, 82, 1334-1340.

Buchanan, T., Heffernan, T. M., Parrott, A. C., Ling, J., Rodgers, J. and Scholey, A. B. (2010). A short self-report measure of problems with executive function suitable for administration via the Internet. Behavior Research Methods, 42, 709-714.

DeLong, E. R., DeLong, D. M. and Clarke-Pearson, D. L. (1988). Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics, 44, 837-845.

Greneche, J., Krieger, J., Bertrand, F., Erhardt, C., Maumy, M. and Tassi, P. (2011). Short-term memory performances during sustained wakefulness in patients with obstructive sleep apnea-hypopnea syndrome. Brain and Cognition, 75, 39-50.

Hall, R. J., Meagher, D. J. and MacLullich, A. M. (2012). Delirium detection and monitoring outside the ICU. Best Practice & Research: Clinical Anaesthesiology, 26, 367-383.

Inouye, S. K., van Dyck, C. H., Alessi, C. A., Balkin, S., Siegal, A. P. and Horwitz, R. I. (1990). Clarifying confusion: the confusion assessment method. A new method for detection of delirium. Annals of Internal Medicine, 113, 941-948.

Iverson, G. L. (2001). Interpreting change on the WAIS-III/WMS-III in clinical samples. Archives of Clinical Neuropsychology, 16, 183-191.

Jorm, A. F. and Jacomb, P. A. (1989). The Informant Questionnaire on Cognitive Decline in the Elderly (IQCODE): socio-demographic correlates, reliability, validity and some norms. Psychological Medicine, 19, 1015-1022.

Katzman, R., Brown, T., Fuld, P., Peck, A., Schechter, R. and Schimmel, H. (1983). Validation of a short Orientation-Memory-Concentration Test of cognitive impairment. American Journal of Psychiatry, 140, 734-739.

Kean, J. and Ryan, K. (2008). Delirium detection in clinical practice and research: critique of current tools and suggestions for future development. Journal of Psychosomatic Research, 65, 255-259.

Lines, C. R., Dawson, C., Preston, G. C., Reich, S., Foster, C. and Traub, M. (1991). Memory and attention in patients with senile dementia of the Alzheimer type and in normal elderly subjects. Journal of Clinical and Experimental Neuropsychology, 13, 691-702.

Marcantonio, E. R. (2008). Clinical management and prevention of delirium. Psychiatry, 7, 42-48.

Meagher, D. J., Leonard, M., Donnelly, S., Conroy, M., Saunders, J. and Trzepacz, P. T. (2010). A comparison of neuropsychiatric and cognitive profiles in delirium, dementia, comorbid delirium-dementia and cognitively intact controls. Journal of Neurology, Neurosurgery and Psychiatry, 81, 876-881.

Metzler-Baddeley, C. (2007). A review of cognitive impairments in dementia with Lewy bodies relative to Alzheimer's disease and Parkinson's disease with dementia. Cortex, 43, 583-600.

Morandi, A., et al. (2012). Tools to detect delirium superimposed on dementia: a systematic review. Journal of the American Geriatrics Society, 60, 2005-2013.

Morris, J. C., et al. (1989). The Consortium to Establish a Registry for Alzheimer's Disease (CERAD). Part I. Clinical and neuropsychological assessment of Alzheimer's disease. Neurology, 39, 1159-1165.

O'Keeffe, S. T. and Gosney, M. A. (1997). Assessing attentiveness in older hospital patients: global assessment versus tests of attention. Journal of the American Geriatrics Society, 45, 470-473.

O'Regan, N. A., et al. (2014). Attention! A good bedside test for delirium? Journal of Neurology, Neurosurgery and Psychiatry.

Perry, R. J., Watson, P. and Hodges, J. R. (2000). The nature and staging of attention dysfunction in early (minimal and mild) Alzheimer's disease: relationship to episodic and semantic memory impairment. Neuropsychologia, 38, 252-271.

Prytherch, D. R., Smith, G. B., Schmidt, P. E. and Featherstone, P. I. (2010). ViEWS--Towards a national early warning score for detecting adult inpatient deterioration. Resuscitation, 81, 932-937.

Siddiqi, N., House, A. O. and Holmes, J. D. (2006). Occurrence and outcome of delirium in medical in-patients: a systematic literature review. Age and Ageing, 35, 350-364.

R Core Team (2014). R: A language and environment for statistical computing. R Foundation for Statistical Computing. Vienna, Austria.

Tieges, Z., Brown, L. J. and Maclullich, A. M. (2014). Objective assessment of attention in delirium: a narrative review. International Journal of Geriatric Psychiatry.

Tieges, Z., McGrath, A., Hall, R. J. and Maclullich, A. M. (2013). Abnormal level of arousal as a predictor of delirium and inattention: an exploratory study. American Journal of Geriatric Psychiatry, 21, 1244-1253.

Trzepacz, P. T., Mittal, D., Torres, R., Kanary, K., Norton, J. and Jimerson, N. (2001). Validation of the Delirium Rating Scale-revised-98: comparison with the delirium rating scale and the cognitive test for delirium. Journal of Neuropsychiatry and Clinical Neurosciences, 13, 229-242.

Witlox, J., Eurelings, L. S., de Jonghe, J. F., Kalisvaart, K. J., Eikelenboom, P. and van Gool, W. A. (2010). Delirium in elderly patients and the risk of postdischarge mortality, institutionalization, and dementia: a meta-analysis. JAMA, 304, 443-451.

1

Table legends

Table 1. Demographics and pairwise comparisons between groups. Demographics and test scores for each group are presented as median (inter-quartile range) unless otherwise specified. Statistical comparisons between groups were performed using Kruskal-Wallis tests. Pairwise comparisons were carried out using Mann-Whitney U tests. Holm corrections were applied to control for type I error. The 95% Confidence Interval (CI) limits for these group comparisons represent the differences in mean rank between each pair of groups.

Note. Calculations represent percentages of each group in which diagnoses are present. Exhaustive comorbidity data was not recorded for all participants, therefore calculations are subject to the availability of data. COPD = Chronic Obstructive Pulmonary Disease; CON = controls; DEM = dementia; DEL = delirium; CI = Confidence intervals; OSLA = Observational Scale of Level of Arousal; OMCT = Orientation-Memory-Concentration Test.

Table 2. Distribution of responses for each group on the DelApp across three levels of difficulty.

Figure legends

Figure 1. (A) The DelApp visual acuity test. The stimulus in trials 1-4 changed 5 sec after trial onset and total trial duration was 10 sec. Letters (trials 5 and 6) were presented on screen for 8 sec. The outer edges of these stimuli represent the outer edges of the smartphone’s screen. (B) The DelApp attention task comprised seven trials of increasing difficulty presented in a fixed order. Participants were instructed to count centrally presented circles with a duration of 1 sec each, sometimes in the presence of distracting triangles (500ms each; Levels 2 and 3). Number of circles to be counted ranged from 2 to 8. The inter-stimulus interval between circles increased from 1-2 sec in Levels 1 and 2 to 3-4 sec in Level 3. Patients were asked to verbally report the total number of counts per trial.

Figure 2. Boxplots illustrating the DelApp results of each group. The median is represented by the thick horizontal bar. The interquartile range is represented by the height of the inner boxes. The position of vertical bars represents the value of the most distant scores that are not considered to be outliers. Outliers are represented by open circles.

Figure 3. Scatterplot demonstrating the relationship detween DelApp and DRS-R98 severity scores. The diagonal black line is the overall regression line. Each group is represented separately using the shapes listed in the legend above. Data points have been jittered. There was a moderate negative correlation between DelApp and DRS-R98 severity scores (τ = -.60, p < .001).

Supplementary material

Supplementary Figure 1. The Edinburgh Delirium Test Box (EDTB) Mark 2 instrument and the smartphone which was used to administer the DelApp task.

Delirium

(N=50)

Dementia

(N=52)

Control

(N=54)

Statistical test results

Pairwise comparisons

Age, years

85 (79.25-90)

87 (80-91)

75 (70-86)

H (2) = 26.62, p < .001

DEL vs. CON: U = 750.5, p < .001, 95% CI = [13, 52.27]

DEL vs. DEM: U = 1086.5, p = .19, 95% CI = [-9.09, 30.55]

DEM vs. CON: U = 658, p < .001, 95% CI = [23.93, 62.81]

Comorbidities, %

Hypertension

48

37

44

X2 (2) = 1.44, p = .52

COPD

14

8

9

X2 (2) = 1.19, p = .57

Ischaemic heart disease

14

10

11

X2 (2) = 0.5, p = .78

Diabetes mellitus

18

19

30

X2 (2) = 2.48, p = .32

Chronic kidney disease

10

15

56

X2 (2) = 2.79, p = .25

Test scores

DelApp (max. score = 10)

6 (4-7)

10 (9-10)

10 (10-10)

H (2) = 102.67, p < .001

DEL vs. CON: U = 31.5, p < .001, 95% CI = [69.25, 92.84]

DEL vs. DEM: U = 174.5, p < .001, 95% CI = [50.58, 74.38]

DEM vs. CON: U = 985.5, p < .001, 95% CI = [6.90, 30.25]

DRS-R-98 (max. score = 39)

25 (18.25-29.5)

6 (4-8.25)

0.5 (0-1)

H (2) = 132.41, p < .001

DEL vs. CON: U = 0, p < .001, 95% CI = [21, 26]

DEL vs. DEM: U = 123.5 , p < .001, 95% CI = [15, 20]

DEM vs. CON: U = 80, p < .001 , 95% CI = [4, 6]

OMCT (max. score = 28)

8 (3-14.75)

12 (5.75-15.25)

26 (24-28)

H (2) = 97.47, p < .001

DEL vs. CON: U = 95.5, p < .001, 95% CI = [64.78, 91.04]

DEL vs. DEM: U = 1100.5, p = .25, 95% CI = [-6.91, 19.57]

DEM vs. CON: U = 21.5, p < .001, 95% CI = [58.59, 84.58]

OSLA (max. score = 15)

3 (2-5.75)

0 (0-0)

0 (0-0)

H (2) = 118.65, p < .001

DEL vs. CON: U = 64, p < .001, 95 % CI = [66.67, 84.70]

DEL vs. DEM: U = 121, p < .001, 95% CI = [60.20, 78.40]

DEM vs. CON: U = 1265, p = .0995% CI = [-15.32, 2.54]

BAT (max. score = 7)

4 (2.25-5)

6 (4.75-6.26)

7 (6-7)

H (2) = 62.19, p < .001

DEL vs. CON: U = 262.5, p < .001; 95% CI = [51.43, 83.86]

DEL vs. DEM: U = 704.5, p < .001, 95% CI = [14.37, 47.10]

DEM vs. CON: U = 656.5, p < .001, 95% CI = [20.85, 52.96]

Table 1.

Group

Level

n (%) Correct

n (%) Incorrect

Delirium

1

78 (52)

72 (48)

2

33 (33)

67 (67)

3

27 (27)

73 (73)

Dementia

1

143 (91.7)

13 (8.3)

2

89 (85.6)

15 (14.4)

3

89 (85.6)

15 (14.4)

Control

1

161 (99.4)

1 (0.62)

2

103 (95.3)

5 (4.6)

3

105 (97.2)

3 (2.8)

Table 2.

Figure 1

Figure 2

Figure 3.

Supplementary figure