hur utvärderar man klinisk bildkvalitet med statistiska...
TRANSCRIPT
Hur utvärderar man klinisk bildkvalitet
med statistiska metoder? SK-kurs, Stra lningsfysik, teknik och stralskydd
9 dec 2015
Örjan SmedbySkolan för teknik och hälsa (STH)
Kungliga Tekniska Högskolan (KTH)
Örjan Smedby, STH, KTH
Sammanhang
Under-sökning
Hälso-effekt
Behandlings-effekt
Val av behandlingDiagnos
Örjan Smedby, STH, KTH
• Level 1: Technical efficacy Bildkvalitet, upplösning, brus...
• Level 2: Diagnostic accuracy efficacy Hur ofta blir diagnosen rätt?
• Level 3: Diagnostic thinking efficacy Hur påverkas remittentens diagnostiska tänkande?
• Level 4: Therapeutic efficacy Hur påverkas valet av behandling?
• Level 5: Patient outcome efficacy Hur påverkas patientens hälsa?
• Level 6: Societal efficacyNytta och kostnader för samhället
(Fryback DG, Thornbury JR. Med Decis Making 1991)
Efficacy of Diagnostic Methods
Örjan Smedby, STH, KTH
Image quality vs. diagnostic accuracyEvaluate
entire diagnostic
process
Reliable ground
truth
Evaluate physical quality
parameters
Physical measuring
tools
Classical statistical
tools?
Örjan Smedby, STH, KTH
Ett diagnostiskt test
Hur stor är chansen att en sjuk klassificeras rätt?Sensitivitet25/30 = 83%
Pos test Neg test SummaSjuk 25 5 30Frisk 15 105 120
Summa 40 110 150
Örjan Smedby, STH, KTH
Ett diagnostiskt test
Hur stor är chansen att en frisk klassificeras rätt?Specificitet105/120 = 88%
Pos test Neg test SummaSjuk 25 5 30Frisk 15 105 120
Summa 40 110 150
Örjan Smedby, STH, KTH
Ett diagnostiskt test
Hur stor är sannolikheten att en pat med pos test verkligen är sjuk?Positivt prediktionsvärde25/40 = 63%
Pos test Neg test SummaSjuk 25 5 30Frisk 15 105 120
Summa 40 110 150
Örjan Smedby, STH, KTH
Ett diagnostiskt test
Hur stor är sannolikheten att en pat med neg test verkligen är frisk?Negativt prediktionsvärde105/110 = 95%
Pos test Neg test SummaSjuk 25 5 30Frisk 15 105 120
Summa 40 110 150
Örjan Smedby, STH, KTH
Tröskelnivå
Högre gräns för patologi:- sensitiviteten sjunker- specificiteten ökarLägre gräns för patologi:- sensitiviteten ökar- specificiteten sjunker
sensitivitet
specificitet
Örjan Smedby, STH, KTH
ROC kurvasensitivitet
Area under ROC curve(AUROC):
1 perfekt
0,5 värdelös
00
1
1 1–specificitet
Örjan Smedby, STH, KTH
Generalisering av sensitivitet och specificitetHur ändras sens. och spec. när tröskeln
ändras?
Receiver operating characteristics
Örjan Smedby, STH, KTH
Sannolikheter
0% 100%
sannolikhet före us
(pre-test probability)
sannolikhet efter us
(post-test probability)
sannolikhet efter us
(post-test probability)
Positivt fyndNegativt fynd
AktionströskelExklusionströskel
Hur sannolikheten påverkas av ett pos resp neg fynd kan beräknas med ”likelihood ratios” (LR+ och LR–), som beror av sens och spec.(Se Wikipediaartikeln ”Likelihood ratios in diagnostic testing”)
Örjan Smedby, STH, KTH
Trösklar för behandling eller expektans beror på konsekvenserna av resp beslut.
Trösklar
Örjan Smedby, STH, KTH
Generalisering av sensitivitet och specificitetHur ändras sens. och spec. när tröskeln
ändras?Kräver ett facit (gold standard) Kräver ett stort material
Mycket arbete, stora kostnader
Receiver operating characteristics
Örjan Smedby, STH, KTH
Image quality vs. diagnostic accuracyEvaluate
entire diagnostic
process
Reliable ground
truth
ROC study
Evaluate physical quality
parameters
Physical measuring
tools
Classical statistical
tools
Örjan Smedby, STH, KTH
Image quality vs. diagnostic accuracyEvaluate
entire diagnostic
process
Reliable ground
truth
ROC study
Evaluate physical quality
parameters
Physical measuring
tools
Classical statistical
tools
Visual image quality concept
Visual grading
experiment
?
Örjan Smedby, STH, KTH
Study types
Single images“Rate image A on a scale from 1 to 5”
Image pairs“Rate the difference between image A and B on a scale from –2 to +2”
Örjan Smedby, STH, KTH
Criteria & rating scaleEUROPEAN COMMISSION
EUROPEAN GUIDELINESON QUALITY CRITERIAFOR DIAGNOSTIC RADIOGRAPHIC IMAGES
EUR 16260 EN Let alone a MAGRITTE
Örjan Smedby, STH, KTH
“European guidelines on quality criteria”Typical: visibility of an anatomical structure“Visually sharp reproduction of the intervertebral
joints”5. Criterion is fulfilled4. Criterion is probably fulfilled3. Indecisive2. Criterion is probably not fulfilled1. Criterion is not fulfilled
Criteria & rating scale
Örjan Smedby, STH, KTH
“European guidelines on quality criteria”Typical: visibility of an anatomical structure“Visually sharp reproduction of the intervertebral
joints”+2: Criterion is better fulfilled in right image+1: Criterion is probably better fulfilled in right image 0: Indecisive–1: Criterion is probably better fulfilled in left image–2: Criterion is better fulfilled in left image
Criteria & rating scale
Örjan Smedby, STH, KTH
Patient ObserverPost-
proces-sing
Imaging score
P1P2P3P4...
Im1Im2
PP1PP2
O1O2O3...
PP3
Situation
Örjan Smedby, STH, KTH
Nominal:individual categories, no order
Interval:numerical, continuous
Types of dataMeasurement��
I
N
O
10 20 30 40
A B C D
1 2 3 4 5
Ordinal:ordered categories
Rating score��
Persons
Örjan Smedby, STH, KTH
(Båth & Månsson BJR 2007)För varje kvalitetsnivå: Hur stor andel uppfyller
kravet med metod A resp. metod B?
Visual grading characteristics (VGC)
Metod A Metod B0.00 0.00
0.05 0.20
0.20 0.50
0.50 0.80
0.80 0.95
1.00 1.00
5
4 5
3 4 5
2 3 4 5
1 2 3 4 5
5
4 5
3 4 5
2 3 4 5
Örjan Smedby, STH, KTH
Patient
Observer
Post-processing
Imaging system
score
Statistical model
Settings ?I
N
O
N
N
N
Im1 Im2 Im3
10 20 30 40
1 2 3 4 5
Örjan Smedby, STH, KTH
Patient
Observer
Post-processing
Imaging system
score
Statistical model
Settingsordinal logistic
regressionI
N
O
N
N
N
Örjan Smedby, STH, KTH
Logistic regressionLogit function logit (p) = log (p/(1–p))
Regression equation logit (p) = –ax + b
p = 1/(1 + exp(–ax + b))
Örjan Smedby, STH, KTH
Ordinal logistic regressionLogit function logit (p) = log (p/(1–p))
Regression equation logit (p) = ax + b
p = 1/(1 + exp(ax + b))
logit (P(y≤n)) = a1Im1 +a2Im2 + b1PP1 +b2PP2 +b3PP3 + DP +EO – Cn
VGR model
(Smedby & Fredrikson, British Journal of Radiology 2010)
Örjan Smedby, STH, KTH
Patient
Observer
Post-processing
Imaging system
score
Statistical model
Settingsordinal logistic
regression(VGR)
I
N
O
N
N
N
Im1 Im2 Im3
PP1 PP2
Örjan Smedby, STH, KTH
Patient
Observer
Post-processing
Imaging system
score
Statistical model
Settingsordinal logistic
regression(VGR)
I
N
O
N
N
N
Im1 Im2 Im3
PP1 PP2
fixed effect
fixed effect
random effect
random effect
Örjan Smedby, STH, KTH
Coronary CTA24 patients (P1–P24)Standard (310 mAs Ref) and reduced dose (62 mAs Ref)Reduced-dose images post-processed with 2D
adaptive filter (Sharpview)Filtered and unfiltered reduced-dose images
viewed by 9 radiologists (R1–R9)
Empirical data (De Geer 2011)
Örjan Smedby, STH, KTH
Criterion 1: Visually sharp reproduction of the thoracic aorta.
Criterion 2: Visually sharp reproduction of the wall of the thoracic aorta.
Criterion 3: Visually sharp reproduction of the heart.
Criterion 4: Visually sharp reproduction of the left main coronary artery (LMA).
Criterion 5: The image noise in relevant regions is sufficiently low for diagnosis.
Criteria
Örjan Smedby, STH, KTH
1. Criterion is fulfilled2. Criterion is probably fulfilled3. Indecisive4. Criterion is probably not fulfilled5. Criterion is not fulfilled
Rating scale
Örjan Smedby, STH, KTH
Patient
Observer
Post-processing score
Statistical model
ordinal logistic
regressionN
O
N
N
unfil-tered
filter-ered
Örjan Smedby, STH, KTH
�Criterion
Ordinal logistic regression
regression coefficient p value
1: Visually sharp reproduction �of the thoracic aorta –0.53 0.0036
2: Visually sharp reproduction of the aortic wall –0.90 <0.000001
3: Visually sharp reproduction of the heart –0.81 0.00005
4: Visually sharp reproduction of the LMA –0.78 0.000004
5: Noise sufficiently low for diagnosis –0.96 <0.000001
Results: filter effect
Örjan Smedby, STH, KTH
Both standard-dose and reduced-dose images were viewed, reduced-dose images with and without filtering
Including mAs effect
Örjan Smedby, STH, KTH
Patient
Observer
Post-processing
score
Statistical model with mAs
log mAssettingI
N
O
N
N
unfil-tered
filter-ered
62 310 ordinal logistic
regression
Örjan Smedby, STH, KTH
Patient
Observer
Post-processing
score
Statistical model with mAs etc.
N
O
N
N
unfil-tered
filter-ered
ordinal logistic
regression
Education
Weight
O
I
log mAssettingI
62 310
Örjan Smedby, STH, KTH
Dose reductionCriterion 3
0 150 30086 1150.0
0.2
0.4
0.6
0.8
1.0
mAs Ref setting
Prob
abilit
y of
a s
core
of 1
or 2
Unfiltered
Filtered
Örjan Smedby, STH, KTH
�Criterion
Regression coefficients�
log (mAs) adaptive filter
1: Visually sharp reproduction ��of the thoracic aorta –2.52� –0.45�2: Visually sharp reproduction of the aortic wall –2.53� –0.75�3: Visually sharp reproduction of the heart –2.54� –0.74�4: Visually sharp reproduction of the LMA –2.52� –0.61�5: Noise sufficiently low for diagnosis –2.74� –0.77�
Results with mAs
Örjan Smedby, STH, KTH
�Criterion
Regression coefficients� Estimated mAs
reductionlog (mAs) adaptive filter
1: Visually sharp reproduction ��of the thoracic aorta –2.52� –0.45� 16%�2: Visually sharp reproduction of the aortic wall –2.53� –0.75� 26%�3: Visually sharp reproduction of the heart –2.54� –0.74� 25%�4: Visually sharp reproduction of the LMA –2.52� –0.61� 21%�5: Noise sufficiently low for diagnosis –2.74� –0.77� 24%�
Results with mAs
Örjan Smedby, STH, KTH
Study II
Abdominal CT (Philips Mx8000IDT)Standard dose (180 mAs; CTDIvol=12 mGy) vs.
reduced dose (90 mAs; CTDIvol=6 mGy) vs. reduced dose with 2D filtering vs. reduced dose with 3D filtering Normal dose Low dose
Low dose2D filtered
Low dose3D filtered
12 patients, 6 observersImage-pair viewing 5 image quality criteria, judged on a 5-level scale
(–2…+2)
Örjan Smedby, STH, KTH
Visual grading scoresCriterion 1: Delineation of pancreas
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Normaldose
Lowdose
2Dfilter
3Dfilter
Earlyphase
Latephase
210-1-2
Frequency of favourable (+) vs. unfavourable (–) scores for each image type
Örjan Smedby, STH, KTH
Potential for dose reduction
Image quality criterion Equivalent mAs, 2D filter
Equivalent mAs, 3D filter
Dose reduction, 2D filter
Dose reduction, 3D filter
Criterion 1: Delineation of pancreas 112 103 38% 43%
Criterion 2: Delineation of veins in liver 120 102 33% 43%
Criterion 3: Delineation of common bile duct 114 102 37% 43%
Criterion 4: Image noise 106 88 41% 51%Criterion 5: Overall
diagnostic acceptability 117 102 35% 43%
Predicted mAs settings that after filtering would yield image quality equivalent to normal dose (180 mAs).
Örjan Smedby, STH, KTH
For analyzing diagnostic accuracy, ROC studies are superior, but costly and cumbersome.
Visual grading experiments describe visual image quality.
Simple comparisons can be made with VGC.Ordinal logistic regression (VGR) makes it
possible to obtain direct numeric estimates of the potential for dose reduction.
Particularly useful when testing and optimising acquisition/post-processing protocols.
Conclusion