60939094

Upload: adhiatma-dot

Post on 04-Jun-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/13/2019 60939094

    1/11

    ORIGINAL ARTICLE

    Vertebral fracture risk (VFR) score for fracture prediction

    in postmenopausal women

    M. Lillholm &A. Ghosh &P. C. Pettersen &

    M. de Bruijne &E. B. Dam &M. A. Karsdal &

    C. Christiansen &H. K. Genant &M. Nielsen

    Received: 25 January 2010 /Accepted: 2 September 2010 /Published online: 11 November 2010# International Osteoporosis Foundation and National Osteoporosis Foundation 2010

    Abstract

    Summary Early prognosis of osteoporosis risk is not only

    important to individual patients but is also a key factor

    when screening for osteoporosis drug trial populations. We

    prese nt an osteo porosis fracture risk score based on

    vertebral heights. The score separated individuals who

    sustained fractures (by follow-up after 6.3 years) from

    healthy controls at baseline.

    Introduction This casecontrol study was designed to

    assess the ability of three novel fracture risk scoring

    methods to predict first incident lumbar vertebral fractures

    in postmenopausal women matched for classical risk factors

    such as BMD, BMI, and age.

    Methods This was a casecontrol study of 126 postmeno-

    pausal women, 25 of whom sustained at least one incident

    lumbar fracture and 101 controls that maintained skeletal

    integrity over a 6.3-year period. Three methods for fracture

    risk assessment were developed and tested. They are based

    on anterior, middle, and posterior vertebral heights mea-

    sured from vertebrae T12-L5 in lumbar radiographs at

    baseline. Each scores fracture prediction potential was

    investigated in two variants using (1) measurements from

    the single most deformed vertebra or (2) average measure-

    ments across vertebrae T12-L5. Emphasis was given to the

    vertebral fracture risk (VFR) score.

    Results All scoring methods demonstrated significant sepa-

    ration of cases from controls at baseline. Specifically, for the

    VFR score, cases and controls were significantly different

    (0.670.04 vs. 0.350.03, p

  • 8/13/2019 60939094

    2/11

    Introduction

    Postmenopausal osteoporosis remains a serious condition

    affecting millions of individuals worldwide. Current epide-

    miological evidence suggests that in industrialized

    countries, approximately 40% of postmenopausal women

    at the age of 60 and as many as 70% of women at the age of

    80 suffer from osteoporosis [1]. Postmenopausal osteopo-rosis is characterized by a reduction in bone mass due to

    increased bone resorption and a simultaneous but less

    pron ounce d increase in bone formation, resu lting in

    negative net calcium balance. This ongoing process fueled

    by chronic estrogen deficiency may eventually lead to micro-

    architectural osteoporosis, possible fractures, and substantial

    deterioration in the quality of life. The cardinal feature of

    osteoporosis is the occurrence of fragility fractures, typically

    in the spine, but also in the forearm and hips. Whereas limb

    fractures are easy to diagnose, the case is different for the

    spine region, where mild vertebral fractures are often

    asymptomatic. Though the mortality rate from osteoporoticfractures is the highest for those of the hip, vertebral fractures

    are the most common type of fragility fractures with an

    estimated occurrence of 750,000 cases per year in the USA

    [2]. Osteoporotic vertebral fractures typically occur earlier

    and are an established risk factor for hip fractures [3].

    Presence of severe vertebral fractures has been associated

    with acute and chronic pain, impaired quality of life,

    increased risk of osteoporotic limb fractures, and shortened

    life expectancy [4]. There is, therefore, a continuing interest

    in identifying independent predictors of vertebral fractures

    that could facilitate the detection of high-risk patients, who

    would benefit the most from early prevention.

    Vertebral fractures are often diagnosed and graded by

    experienced radiologists using qualitative [5] or semi-

    quantitative methods such as those described by Genant et

    al. [6] and others [711]. The methods were shown to be

    robust to intra-observer variations but may be difficult to

    apply uniformly across different clinical centers. More

    importantly, it is yet to be decided which one of these semi-

    quantitative methods should be used as a gold standard

    [12]. In order to overcome some of these problems, fully

    quantitative methods have been developed [1323]. One of

    the shortcomings of the discretenature (due to the use of

    thresholds) of most of these methods is the inability to

    quantify subtle changes in the vertebral shape. Hence, a

    more robust and detailed study of the vertebral/spine shape

    abnormalities should produce (1) an objective quantifiable

    measure for detection and severity-grading of fractures and

    (2) details of pre-fracture vertebral-shape changes that lead

    to betterprediction of osteoporotic fractures.

    The present study investigated whether computer-based

    measures of fracture risk, calculated using vertebral pre-

    fracture shape variations, could differentiate healthy sub-

    jects who later sustain a vertebral fracture from those who

    maintain vertebral integrity when matched for of an array of

    traditional risk factors, including bone mineral density

    (BMD). The rationale behind such an investigation is that

    the detection of pre-fracture conditions and successful

    prediction of vertebral fragility fractures will help the study

    of osteoporosis in the following important ways: (1) early

    diagnosis and treatment for patients, (2) more preciseassessment of efficacy of fracture prevention drugs by the

    identification of subjects with a high likelihood of

    sustaining an osteoporotic fracture, and (3) decrease in the

    required sample size for clinical studies by inclusion of

    high-risk subjects. In this paper, we pursue parts of the

    second and third objectives and present quantitative

    analyses of pre-fracture vertebral-shape changes that yield

    subject scores indicating first-incidence lumbar vertebral

    fracture risk. It is demonstrated how this score can be used

    to rank a screening population and drive the selection of a

    net study population with increased average fracture risk.

    Materials and methods

    A casecontrol study was designed such that case-group

    patients had no prevalent lumbar vertebra fractures and by

    follow-up 6.3 years later had sustained at least one fracture

    in the lumbar spine only. The control group maintained

    skeletal integrity throughout and was matched with respect

    to an array of traditional risk factors.

    The study population was chosen from the PERF cohort

    [24] which consisted of 4,062 community-recruited post-

    menopausal Danish women first screened between 1992 and

    1995 and subsequently reviewed between 2000 and 2001.

    PERF contained 662 patients with at least one new vertebral

    fracture at follow-up; 88 of the 662 had no prevalent

    fractures at baseline and of those, 25 suffered incident

    fractures in the lumbar region (T12 to L5) only and were

    selected as case patients. The case group was matched by a

    fracture-free 101 large control group also from the PERF

    cohort with comparable risk factors such as age, body mass

    index (BMI), family history of osteoporosis, alcohol and

    milk consumption, history of hormone replacement therapy,

    spine BMD, smoking habits, and self-reported physical

    exercise. Any patients with non-osteoporotic vertebral

    deformities or non-osteoporotic fractures were excluded

    before case and control groups were selected.

    At baseline, none of the 126 subjects displayed any sign of

    disorders of calcium metabolism or bone disease, or took any

    medication known to affect bone metabolism. All subjects

    were interviewed to obtain information on their medical

    history, use of medication, and other life style factors. Subjects

    underwent a complete physical examination; weight and

    height were determined without shoes and with the subjects

    2120 Osteoporos Int (2011) 22:21192128

    http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-
  • 8/13/2019 60939094

    3/11

    wearing light indoor clothes, and the BMI was calculated.

    BMD of the spine was determined by bone densitometry

    using a Lunar Prodigy scanner and lateral radiographs of the

    lumbar spine were taken of the patients using a standard

    technique [24]. Written consent was obtained from each

    participant according to the Helsinki Declaration II. The

    study was approved by the local ethics committee.

    Spine radiographs acquisition and fracture assessment

    Spinal examinations were performed according to pre-

    approved protocols. Radiographs of the lumbar region were

    taken for each of the subjects at baseline and follow-up. In the

    lateral position, pillows were used to ensure good alignment

    of the vertebral bodies. The distance between the focal plane

    and the film was kept constant at 1.2 m and the central beam

    was directed to L2. Patients were asked to hold in their

    expiration for the duration of the radiograph acquisition. The

    same group of staff examined each of the subjects. Anterior

    posterior (AP) radiographs were used for a general view andassessment of vertebral deformities (i.e., scoliosis). Obvious

    vertebral fractures in AP radiographs were noted but the

    primary fracture assessment was performed on lateral radio-

    graphs. Fracture assessment and classification from the

    original PERF study [24] was re-evaluated and confirmed

    using Genants semi-quantitative method [6] by PP who was

    trained in and had several years of experience using the

    method. Baseline and follow-up radiographs were viewed

    simultaneously to avoid confusing fracture incidence with

    undetected or borderline prevalence. The primary utility of

    the fracture assessment was to establish baseline and follow-

    up fracture presence. These readings were used to identify

    the case (n=36) and controls group (n=108) in the paper by

    Pettersen [25] in which the fracture risk predictive power of

    a computer-based measure of curvature irregularity was

    investigated. The case and control groups in this study were

    specifically selected as subsets of the Pettersens case and

    control group as described below.

    Digitization of radiographs and six-point annotations

    All lateral radiographs were digitized at 45 m (570

    DPI). For further analysis of the images, six points

    (called the height points) were placed at the corners and

    at the middle points of the vertebral endplates, by the

    same radiologist using a computer program; see Fig. 1a.

    Using these measured heights, all vertebrae were evaluat-

    ed by a computer algorithm using a modification of

    Genants methodology with a strict measured threshold of

    0.2 as fracture absence/presence indicator, that is, a

    vertebra was considered fractured if either of the ratios

    between any of the anterior, middle, and posterior heights

    was 0.8 or less.

    Subpopulation and borderline fractures

    The Pettersens[25] population was reduced selecting only

    subjects where the quantitative fracture classification was in

    agreement with the experts fracture assessment. This

    procedure reduced the case and controls groups described

    above to 25 cases and 101 controls, respectively. The

    additional step deliberately filtered out subjects where there

    was borderline disagreement between the SQ- and QM-

    based fracture assessments. This prevented the idea that the

    fracture prediction results reported in this paper were

    influenced by such disagreements.

    T12

    L1

    L2

    L3

    L4

    L5

    baFig. 1 aAn example of a lateral

    lumbar radiograph with

    six-point annotations in redand

    vertebra labels in black. The

    midpoints are always marked on

    the lower of the endplate con-

    tours. b The shape of vertebrae

    for various values ofHmax, Hmincompared to Genants height

    ratio. For illustration purposes,

    the smallest heightHminis

    placed to the left and the largestheightHmax is placed to the

    right in each vertebra. Each

    vertebra has the corresponding

    height ratio noted. Theblue area

    indicates the high-risk patients

    according to VFR. The green

    area indicates the high-risk

    patients according to Genants

    height ratio and as depicted

    different patients may be

    indicated as high-risk patients

    Osteoporos Int (2011) 22:21192128 2121

    http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-
  • 8/13/2019 60939094

    4/11

    Computer-based prediction of vertebral fractures

    Three quantitative scores based on vertebral height mor-

    phometry were developed to predict first incident lumbar

    fractures. The scores were named the vertebral fracture risk

    (VFR), the most deformed height ratio (MDHR), and the

    most deformed height anterior height posterior ratio

    (MDHaHp). Each of the three scores was tested in twoversions: (1) only the most deformed vertebra determined

    the score and (2) the average over vertebrae T12-L5

    determined the score.

    In the following, we initially describe the VFR score on

    the most deformed vertebra in detail. The remaining two

    scores and the average versions of all scores use the same

    overall methodology as the VFR score and only deviations

    from this are presented.

    Computation of the VFR score consisted of two steps:

    (1) selection of the most deformed vertebra and (2) scoring

    of this vertebra to form a single score for the patient:

    Step 1

    Selection of the most deformed vertebra: For a given

    patient, the vertebral height ratios of the smallest to the

    largest anterior, middle, or posterior heights were computed

    for all vertebrae. The vertebra among T12 to L5 with

    minimal height ratio was denoted the most deformed and

    chosen to represent each patient. Due to the patient

    selection described above, all ratios were above 0.8.

    Step 2

    VFR scoring: Each selected vertebra was represented by the

    maximal Hmax and minimal Hmin of its three vertebral

    heights. The selected heights were normalized by the mean

    of the vertebral heights in question:

    Hmax maxfHant;Hmed;Hpostg=meanfHant;Hmed;Hpostg;Hmin minfHant;Hmed;Hpostg=meanfHant;Hmed;Hpostg

    Consequently, Hmax1.0 and Hmin1.0 for any vertebra.

    The relationship between Hmax, Hmin, and height ratios as

    used in, for example, Genants method is illustrated in

    Fig. 1b. Each patient in the case and control groups was

    represented by their normalized height pair. Vertebral height

    pairs in eac h gro up wer e ass ume d to be normally

    distributed, that is following a bivariate normal distribution

    in (Hmax, Hmin). The empirical mean and covariance

    matrix were estimated as standard maximum likelihood

    estimates for each of the case and control groups and the

    likelihood Pof belonging to each group expressed as:

    Pcase Hmax;Hmin N mcase;P

    case Pcontrol Hmax;Hmin N mcontrol;

    Pcontrol

    For new vertebrae, the relative likelihood:

    Pcase Hmax;Hmin = Pcase Hmax;Hmin Pcontrol Hmax;Hmin

    of belonging to the case group was computed from the

    estimated normal distributions. This relative likelihood ratio

    was defined to be the VFR score. It is, as constructed, a

    number between 0 and 1 representing the probability of

    sustaining a fracture. It should be expected that a patientwith a clear prevalent fracture will have a VFR score close

    to 1; namely that the chance of sustaining a future fracture

    is trivially very high unless the already fractured vertebra is

    excluded from the score calculation. The VFR for all

    patients was computed in a leave-one-out procedure to

    avoid bias and underestimation of the variance. The explicit

    modeling of shape variations through bivariate normal

    distributions as opposed to single thresholds on, say, height

    ratios allowed for a fuller representation of both normal and

    fracture-prone shape variation.

    The remaining two scores are calculated using the same

    overall methodology as the VFR score with the followingexceptions: For MDHR, a vertebra was represented by the

    minimal height ratio and the most deformed representative

    selected as described above. For the MDHaHp ratio, the

    minimal height ratio was calculated based on the anterior and

    posterior heights only but was otherwise identical to MDHR.

    This means that both the MDHR and MDHaHp scores have

    one-dimensional representations (ratios) and the fitted normal

    distributions were thus univariate. Furthermore, the explicit

    height normalization step from the VFR score was not needed

    due to the implicit normalization through ratios.

    All three scores were also tested in mean versions

    (MVFR, MHR, and MHaHp) where a patient was repre-

    sented as the mean of height pairs or ratios over vertebrae

    T12-L5 instead of the single most deformed vertebra as

    described above.

    We compared these morphometric prognostic markers

    based on analysis of the individual vertebrae with the

    irregularity measure based on spine curvature suggested for

    fracture prediction by Pettersen [25]. Any comparisons with

    the Pettersens method are reported on the reduced dataset

    presented in this paper for both ours and Pettersens method.

    Method for high-risk clinical study screening

    The high-risk population selection-mechanism outlined

    below was aimed at maximizing the number of subjects

    most likely to sustain first incident lumbar vertebral

    fractures in a trial population selected from a larger

    screening population. Population selection is only described

    for the VFR score but could also be realized using any of

    the other five proposed scores.

    The subpopulation selection-mechanism consisted of

    three steps: (1) scoring all subjects from the gross

    2122 Osteoporos Int (2011) 22:21192128

    http://-/?-http://-/?-http://-/?-http://-/?-
  • 8/13/2019 60939094

    5/11

    population using VFR, (2) sorting them in descending VFR

    order, and (3) selecting the required number of subjects;

    here we selected the 50% with the highest VFR score.

    The selection was evaluated on the same study popula-

    tion as the scoring methods outlined above.

    Statistical analysis

    Results are presented as mean SEM unless otherwise

    specified. The scores of different groups of subjects were

    compared using the non-parametric MannWhitney U test.

    Differences were considered statistically significant if p

    values were less than 0.05.

    The ability to separate cases from controls was further

    characterized through the area under the ROC curve

    (AUC). Significance of differences between AUC was

    tested with Delongs method [26]. Odds ratios (ORs) and

    95% confidence intervals between highest and lowest

    tertiles are reported for each method and the significance

    of differences between ORs was tested using Taronesvariant of the BreslowDay test [27].

    As a test of the BMD influence on the predictive value

    of the VFR score, logistic regression with the VFR score

    and BMD as independent variables was used.

    The subpopulation selection result is reported as the

    relative number of cases above the median of the VFR

    scores across both the case and control datasets.

    To assess the inter-annotator stability of suggested

    VFR score, the 126 baseline radiographs were six-point

    annotated (T12-L5) an additional two times by two

    experienced X-ray technicians. We report mean SEM,

    AUC, and ORs including significance levels for the two

    repeat annotations where the VFR score was trained on

    the original annotation. To assess the inter-observer

    stability of the suggested high-risk population selection

    methodology, we further report the percentage of cases in

    the top half of the VFR ordered dataset for each repeat

    annotation.

    We report an overall simulated estimate of expected

    performance for repeat annotations through an estimate of

    the annotation scatter observed across the three annotators.

    The repeat annotations yielded a total of 126664; 500 vertebrae points annotated by three trained observ-ers. These data were used to estimate the mean and standard

    deviation for inter-observer annotation variability forx and

    yannotation coordinates separately. The original full dataset

    was subsequently perturbed with normally distributed

    variations in the x and y directions according to the

    estimated means and standard deviations. The main

    experiment described above was repeated using the

    perturbed dataset. This random perturbation and subsequent

    experimentation was repeated 50 times. We report the

    median AUC and 95% confidence intervals and the median

    percentage of cases in the top half over the 50 perturbation

    trials.

    The percentage of cases where the most deformed vertebra

    at baseline is one of the fractured vertebrae at follow-up is

    reported. This number is supplemented by the percentage of

    cases where this would be observed by chance.

    Finally, we report results of the main experiment on the

    full Pettersen [25] dataset and compare to results achievedon the reduced dataset (see page 7) used throughout.

    All data were analyzed using Matlab (Mathworks, USA).

    Results

    Study population

    The skeletal and demographic characteristics of the case

    and control groups are presented in Table 1 where the main

    statistics reported for each group is the median value;

    mean SD is given in parentheses for completeness. Basedon BMD measurements both the case and control groups

    contained approximately half normal (non-osteoporotic)

    and half osteopenic patients at baseline; furthermore both

    groups contained two osteoporotic patients at baseline.

    Fracture prediction

    The three suggested morphometric fractures prediction scores

    based on the single most deformed vertebra VFR, MDHR,

    and MDHaHp showed significant differences between case

    and control patients at baseline: VFR (0.670.04 vs. 0.35

    0.03; p

  • 8/13/2019 60939094

    6/11

    Neither MDHR nor MDHaHp was significantly better

    than any of the mean variants.

    The highest versus lowest tertile ORs with 95%

    confidence intervals for the three suggested methods (both

    most deformed and mean variants), Pettersens irregularity

    measure, and BMD are given in Fig. 4. The VFR ORs was

    significantly more predictive than Irregularity or BMD

    alone (p=0.03 and p =0.004).

    Additional results are given for the VFR score only:

    Figure5is a box and whisker plot of the VFR scores for the

    case and control group at baseline. There was a significant

    difference between baseline and follow-up VFR scores for

    both the case and the control groups: case (0.670.04 vs.

    0.990.01; p

  • 8/13/2019 60939094

    7/11

    Across the 50 datasets perturbed by typical inter-

    annotator variation, the median AUC with 95% confi-

    dence intervals was 0.73 (0.610.82) and the median

    percentage of cases in the top half of the total population

    was 76%.

    The most deformed vertebra at baseline was one of the

    fractured vertebrae at follow-up for 32% of the case

    subjects. With an average of 1.2 fractures (Table 1) per

    case subject, this would happen for 21% of cases by

    chance.

    The AUC and ORs for the main experiment repeated on

    the full Pettersen dataset [25] were: AUC, 0.84 (0.770.90);

    p

  • 8/13/2019 60939094

    8/11

    forces. In the lumbar spine, the convexity is produced

    partly by differences in growth of the anteriorposterior

    parts of the vertebrae [28]. This leads to slight differences

    in the heights usually confined to an acceptable range.

    Subsequent changes in vertebral heights can lead to

    changes in the spinal curvature and to a redistribution of

    forces upon the vertebrae endplates. A vertebra is expected

    to fracture when the loads imposed are similar to or greaterthan its strength [29]. From this we expect that an un-

    fractured lumbar vertebra that presents an abnormal change

    of one or more of three vertebral heights, due to, for

    example, osteoporosis, is more likely to fracture or cause

    fractures than the one which keeps within the normal range

    of shape variations.

    Inspired by this, three morphometric computer-based

    scores, each in two variants, were trained to predict lumbar

    fractures in postmenopausal women through modeled

    variability of measured anterior, middle, and posterior

    vertebral heights in lumbar vertebrae. They were applied

    in a casecontrol study matched for BMD and the othermajor risk factors for osteoporosis. All three methods based

    on the most deformed vertebra were able to significantly

    differentiate subjects who would sustain at least one lumbar

    vertebral fracture from those who maintained vertebral

    integrity over a 6.3-year period. Furthermore, the results

    suggest that the variants based on the most deformed

    vertebra produced better fracture predictions than the

    variants based on means across vertebrae T12-L5. Specif-

    ically, the most deformed VFR variant yields significantly

    better prediction than the MVFR and the MHR and a

    Delong p value of 0.1 for MHaHp. Conversely, neither of

    the other two scores based on the most deformed vertebra

    was significantly better than VFR or any of the mean

    variants. This suggests that the VFR score, which is based

    on two normalized height measurements and thus 2 df,

    measures more diverse shape variations (Fig. 1b) than any

    o f the s ug ge sted 1 df (ratio) scores. Furthermore,

    morphometry-based lumbar fracture prediction using the

    single most deformed vertebra is stronger than prediction

    based on an average or summary across the lumbar spine.

    Finally, VFR delivers significantly better fracture prediction

    than the curvature-based irregularity measure suggested by

    Pettersen [25]. Although, L5 was included in this analysis

    to make a direct comparison with Pettersens work possible,

    the suggested methods (including VFR) are directly

    applicable to datasets where only T12-L4 are annotated in

    lumbar radiographs. Experiments (not included) on the

    current data-set indicated a slight but not statistically

    significant performance drop if only T12-L4 were included.

    This is supported by the relatively larger morphological and

    annotation induced variation of L5; in the context of this

    work most likely too large to consistently add significant

    information.

    The VFR score showed an increased risk for sustaining

    a second fracture in the case-group through a significant

    difference between cases at baseline and follow-up. This is

    in accordance with the literature [30, 31] and our expect-

    ations in designing the VFR score. There was also a

    significant difference for control-subjects measured at

    baseline and follow-up pointing to an overall increas e in

    risk for an incident fracture. Although the control groupremained fracture free, this increase in risk is not

    unexpected over a 6.3 years observation period of

    postmenopausal women where half were osteopenic at

    baseline.

    Our findings, on selection of high-risk subpopulations

    were that a clear majority of the case-patients was found in

    the top half of the combined cases and controls group

    sorted by VFR. This suggests VFR as a viable supplement

    to BMD and standard risk factors to select fracture free but

    likely to fracture subjects from a general screening

    population.

    The discriminative performances of the two repeatannotations were not as high as the original annotation

    but still significant and well within the confidence intervals

    of simulated performance using estimated annotator scatter.

    The same was the case for high-risk subpopulation

    selection performance for both the individual repeat

    annotations and the simulation. These performance drops

    are not unexpected as the repeat sets were scored using the

    original set as training/reference data and not in a leave-

    one-out fashion and are in this sense less biased. Further-

    more, the two repeat sets were annotated by trained X-ray

    technicians and not radiologists as was the case for the

    original set. Indirectly, this is reflected by the observed

    larger SEMs of VFR scores within the case group for the

    repeat annotators compared with the original annotation.

    That fracture prediction of the suggested kind is more

    sensitive to annotation quality and variation than, say,

    standard SQ or QM-based fracture classification is, as

    mentioned, not surprising. The discriminative factors of

    VFR are subtle pre-fracture shape changes that are smaller

    than standard implicit or explicit SQ and QM thresholds

    [6]. Based on this, we emphasize that high quality,

    consistent expert annotations are important in achieving

    good separation. This is especially the case if the methods

    were applied to less matched populations as in, say, clinical

    trial screening. Here, one would need to train the algorithm

    on a similar (same inclusion criteria) but independent

    reference population prior to screening of novel subjects.

    In that scenario, we would not expect discriminative

    performance significantly above the reported median

    simulated performance.

    The vertebrae that fractured were also the most deformed

    vertebrae for 32% of the cases which is somewhat higher

    than was expected by chance alone (21%). This suggests

    2126 Osteoporos Int (2011) 22:21192128

    http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-
  • 8/13/2019 60939094

    9/11

    that a high VFR score indicates an overall lumbar fracture

    risk in terms of, e.g., an uneven biomechanical load

    distribution or overall systemic effects indirectly signaled

    by the most deformed vertebra.

    Finally, the validation of the experiment on the full

    Pettersens population [25] confirmed that the performed

    population reduction to avoid predictive performance based

    on borderline disagreement between SQ and QM did notlead to improved results as expected.

    Limitations of the study

    The subjects used in this study were all community-

    recruited postmenopausal women roughly equally split

    between normal and osteopenic. The case group was

    pre-selected from a gross population of 4,062 as fracture

    free at baseline and with at least one lumbar only

    fracture at follow-up. The control group was matched

    with respect to traditional osteoporosis risk factors and

    maintained skeletal integrity throughout. Any patientswith non-osteoporotic deformities and/or fractures were

    excluded whether these were real or caused by projec-

    tion errors. The main results of this paper, although

    promis ing, mus t be valida ted in future studies to

    establish applicability on populations with fewer restric-

    tions than described above.

    This study focused on fracture risk in the lumbar region

    (including T12). This was primarily done to facilitate easy

    comparison with the related results in the Pettersens paper

    [25] but also in recognition of the fact that pre-fracture

    shape changes are more pronounced in the relatively larger

    lumbar vertebrae. A natural generalization in future studies

    will be to test if pre-fracture shape changes in thoracic

    vertebrae also have fracture-predictive value. Furthermore,

    thoracic and lumbar vertebrae are very likely to give a

    better combined prediction of both lumbar and overall

    fracture risk. It is our expectation that a single linear model

    as presented in this paper would be insufficient to model

    the range of combined lumbar and thoracic morphological

    variance. We, however, believe that two such models

    trained on thoracic and lumbar vertebrae respectively, in

    combination would be sufficient.

    For the purposes of the pre-selection and the study in

    general, fracture absence and presence was established

    using Genants semi-quantitative method [6] by both a

    radiologist and subsequently by a computer algorithm using

    manually measure vertebral heights with a strict fracture

    threshold 0.2. In future studies, it would be relevant to

    establish how robust the proposed fracture risk score are

    with respect to changing the ground truth fracture

    assessment methodology to other established methods as

    suggested by, e.g., Eastell, Melton, Minne, and McCloskey

    [13,18,19,23].

    Conclusion

    We have presented three morphometric scores each in

    t wo va ri an ts . I n a c as ec on tro l s tud y b as e d o n

    community-recruited postmenopausal women with the

    limitations iterated above, all six variants were able to

    pre dic t first inc ide nt lumba r fract ures. In gen eral,

    variants based on only measuring the single mostdeformed vertebra showed more promise than the mean

    variants. More specifically, the 2 df VFR score is

    significantly better than two other mean-based variants

    but not significantly better that the other two suggested

    single vertebra morphometric methods. Subject to the

    limitations iterated above and the availability of high-

    quality radiologist six-point annotations, we conclude that

    relative vertebral heights or height ratios measured on the

    single most deformed vertebra used in combination with

    machine learning techniques appears a promising ap-

    proach for (1) first incident lumbar fracture prediction

    and (2) selection of fracture-prone populations for clinicaltrials investigating treatment and prevention of osteopo-

    rosis and osteoporotic fractures.

    It is, however also, clear that VFR-based fracture

    prediction is highly operator/annotator dependent; this and

    a generalization to both lumbar and thoracic fracture

    prediction will be the focus of future studies.

    Acknowledgements The authors gratefully acknowledge the fund-

    ing from the Danish Research Foundation (Den Danske Forsknings-

    fond) supporting this work. The authors thank Jane Petersen and

    Annette Olesen for the repeat annotations.

    Conflicts of interest The VFR-methodology is part of a pendingpatent. Martin Lillholm is an employee of Synarc Imaging Techonol-

    ogies/Nordic Bioscience Imaging (SIT/NBI). Anarta Ghosh is a former

    employee of SIT/NBI. Paola C Pettersen is an employee of Center for

    Clinical and Basic Research (CCBR). Erik B Dam is an employee of

    SIT/NBI. Morten A Karsdal is an employee and shareholder of Nordic

    Bioscience (NB). Claus Christiansen is an employee and shareholder of

    NB and CCBR. Harry K Genant is an employee and shareholder of

    Synarc. Mads Nielsen is partly funded by SIT/NBI. Marleen de Bruijne

    was previously funded by Nordic Bioscience.

    References

    1. Rodan GA, Martin TJ (2000) Therapeutic approaches to bone

    diseases. Science 289:1508

    2. Watts NB (2001) Osteoporotic vertebral fractures. Neurosurg

    Focus E12:10

    3. Kanis JA, Borgstrom F, De Laet C, Johansson H, Johnell O,

    Jonsson B, Oden A, Zethraeus N, Pfleger B, Khaltaev N (2005)

    Assessment of fracture risk. Osteoporos Int 16:581589

    4. Truumees E (2003) Medical consequences of osteoporotic

    vertebral compression fractures. Instr Course Lect 52:551558

    5. Jiang G, Eastell R, Barrington NA, Ferrar L (2004) Comparison of

    methods for the visual identification of prevalent vertebral fracture

    in osteoporosis. Osteoporos Int 17(11):887896

    Osteoporos Int (2011) 22:21192128 2127

    http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-
  • 8/13/2019 60939094

    10/11

    6. Genant HK, Wu CY, van Kuijk C, Nevitt MC (1993) Vertebral

    fracture assessment using a semiquantitative technique. J Bone

    Miner Res 8:11371148

    7. Ferrar L, Jiang G, Adams J, Eastell R (2005) Identification of

    vertebral fractures: an update. Osteoporos Int 16:717728

    8. Grados F, Roux C, de Vernejoul MC, Utard G, Sebert JL,

    Fardellone P (2001) Comparison of four morphometric definitions

    and a semiquantitative consensus reading for assessing prevalent

    vertebral fractures. Osteoporos Int 12:716722

    9. ONeill TW, Felsenberg D, Varlow J (2004) Diagnosis ofosteoporotic vertebral fractures: importance of recognition and

    description by radiologists. Am J Roentgenol 183:949958

    10. Stone J, Gurrin LC, Byrnes GB, Schroen CJ, Treloar SA, Padilla

    EJ, Dite GS, Southey MC, Hayes VM, Hopper JL (2007)

    Mammographic density and candidate gene variants: a twins and

    sisters study. Cancer Epidemiol Biomark Prev 16:14791484

    11. Nielsen VAH, Pdenphant J, Martens S, Gotfredsen A, Riis BJ

    (1991) Precision in assessment of osteoporosis from spine radio-

    graphs. Euro K Radiol 13:1114

    12. Delmas PD, Langerijt L, Watts NB, Eastell R, Genant H, Grauer

    A, Cahall DL (2005) Underdiagnosis of vertebral fractures is a

    worldwide problem: the IMPACT study. J Bone Miner Res

    20:557563

    13. Eastell R, Cedel SL, Wahner HW, Riggs BL, Melton LJ (1991)

    Classification of vertebral fractures. J Bone miner Res (Print)6:207215

    14. Black DM, Palermo L, Nevitt MC, Genant HK, Epstein R, San

    Valentin R, Cummings SR (1995) Comparison of methods for

    defining prevalent vertebral deformities: the Study of Osteoporotic

    Fractures. J Bone Miner Res 10:890902

    15. Davies KM, Recker RR, Heaney RP (1989) Normal vertebral

    dimensions and normal variation in serial measurements of

    vertebrae. J Bone Miner Res 4:341349

    16. Jensen KK, Tougaard L (1981) A simple X-ray method for

    monitoring progress of osteoporosis. Lancet 2:1920

    17. Kleerekoper M, Parfitt AM, Ellis BI (1984) Measurement of

    vertebral fracture rates in osteoporosis. Copenhagen Int Symp

    Osteoporos 1:103108

    18. Melton LJ, Kan SH, Frye MA, Wahner HW, OFallon WM, Riggs

    BL (2005) Epidemiology of vertebral fractures in women. Am J

    Epidemiol 129:10001011

    19. Minne HW, Leidig G, Wuster C, Siromachkostov L, Baldauf G,

    Bickel R, Sauer P, Lojen M, Ziegler R (1988) A newly developed

    spine deformity index (SDI) to quantitate vertebral crush fractures

    in patients with osteoporosis. Bone Miner 3:335349

    20. Reshef A, SchwartzA, Ben MenachemY, Menczel J, Guggenheim K

    (1971) Radiological osteoporosis: correlation with dietary and

    biochemical findings. J Am Geriatr Soc 19:391402

    21. Ross PD, Yhee YK, He YF, Davis JW, Kamimoto C, Epstein RS,

    Wasnich RD (1993) A new method for vertebral fracture

    diagnosis. J Bone Miner Res 8:167174

    22. Smith-Bindman R, Steiger P, Cummings SR, Genant HK (1991) The

    index of radiographic area (IRA): a new approach to estimating the

    severity of vertebral deformity. Bone Miner 15:137149

    23. McCloskey EV, Spector TD, Eyres KS, Fern ED, ORourke N,Vasikaran S, Kanis JA (1993) The assessment of vertebral

    deformity: a method for use in population studies and clinical

    trials. Osteoporos Int 3(3):138147

    24. Bagger YZ, Tanko LB, Alexandersen P, Hansen HB, Qin G,

    Christiansen C (2006) The long-term predictive value of bone

    mineral density measurements for fracture risk is independent of

    the site of measurement and the age at diagnosis: results from the

    prospective epidemiological risk factors study. Osteoporos Int

    17:471477

    25. Pettersen PC, de Bruijne M, Chen J, He Q, Christiansen C, Tanko

    LB (2007) A computer-based measure of irregularity in vertebral

    alignment is a BMD-independent predictor of fracture risk in

    postmenopausal women. Osteoporos Int 18(11):15251530

    26. DeLong ER, DeLong DM, Clarke-Pearson DL (1988) Comparing

    the areas under two or more correlated receiver operating character-

    istic curves: a nonparametric approach. Biometrics 44:837845

    27. Tarone RE (1985) On heterogeneity tests based on efficient

    scores. Biometrika 72(1):9195

    28. Zebaze RMD, Maalouf G, Maalouf N, Seeman E (2004) Loss of

    regularity in the curvature of the thoracolumbar spine: a measure

    of structural failure. J Bone Miner Res 19:10991104

    29. Duan Y, Seeman E, Turner CH (2001) The biomechanical basis of

    vertebral body fragility in men and women. J Bone Miner Res

    16:22762283

    30. Hasserius R, Karlsson MK, Nilsson BE, Johnell O (2003)

    Prevalent vertebral deformities predict increased mortality and

    increased fracture rate in both men and women: a 10-year

    population-based study of 598 individuals from the Swedish

    cohort in the European Vertebral Osteoporosis Study. Osteoporos

    Int 14:6168

    31. Lunt M, ONeill TW, Felsenberg D, Reeve J, Kanis JA, Cooper C,

    Silman AJ (2003) Characteristics of a prevalent vertebral

    deformity predict subsequent vertebral fracture: results from the

    European Prospective Osteoporosis Study (EPOS). Bone 33:505

    513

    2128 Osteoporos Int (2011) 22:21192128

  • 8/13/2019 60939094

    11/11

    Copyright of Osteoporosis International is the property of Springer Science & Business Media B.V. and its

    content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's

    express written permission. However, users may print, download, or email articles for individual use.