basic principles of nmr-based metabolomics
DESCRIPTION
Basic principles of NMR-based metabolomics. Nils Nyberg NPR, Department of Drug Design and Pharmacology. Outline. NMR as universal detector Top-down approach in science Metabolomics and metabonomics Case story: starved ewes and fat lambs Step-by-step procedure in data handling Processing - PowerPoint PPT PresentationTRANSCRIPT
NTDR, 2012
Basic principles of NMR-based metabolomics
Nils NybergNPR, Department of Drug Design and Pharmacology
NTDR, 2012
Outline
NMR as universal detector
Top-down approach in science
Metabolomics and metabonomics
Case story: starved ewes and fat lambs
Step-by-step procedure in data handling• Processing• Export/import• Calibration• Baseline adjustment• Projection of data to a common axis• Integration and merging of buckets• PCA
NTDR, 2012
Basic principles of NMR-based metabolomics
Metabolome• The complete set of small-molecule metabolites in a
biological system• Metabolomics, proteomics, transcriptomics, …
Metabolic profile, metabolic fingerprint• A quantitative determination of the metabolome in a
sample or an individual
Metabonomics• Quantitative measurement of metabolic response to
stimuli or genetic modification…• Toxicology• Disease diagnostics• Functional genomics (determination of phenotypes)• Nutrigenomics (human diet, drugs and microflora)
Metabonomics ~ metabolomics ~ metabolic profiles
NTDR, 2012
The universal detector
NMR: the most universal detector for small metabolites• No physical separation of analytes!• Robust => reproducible results• Directly quantitative• Simple sample preparation
• Information rich
• Not as sensitive as mass spectrometry• Expensive
NMR is good for a top-down approach• Study the whole system first, before breaking it into
smaller pieces
NTDR, 2012
Starved ewes and fat lambs
Undernutrition during fetal development is associated with increased risk of metabolic diseases later in life.• Dutch winter famine 1944
• Obesity at the age of 50 y in men and women exposed to famine prenatally, Am J Clin Nutr 1999;70:811–16.
• Coronary heart disease, hypertension, and type 2 diabetes.
• Consequences of “programming,” whereby a stimulus or insult at a critical, sensitive period of early life has permanent effects on structure, physiology, and metabolism.
Metabolic programming• Phenotypic alterations by fetal adaption• Higher risk of obesity and diabetes if mismatched diet
(programmed to cope with famine, exposed to hypernutrition)
NTDR, 2012
Starved ewes and fat lambs
Hypothesis: • Metabolic programming by feed restriction leads to changed
metabolic pathways• The changes can be studied by acquiring NMR spectra of urine
Sheep as animal model system• Before birth: Ewes well fed or starved (50% of energy)• After birth: Normal diet or “High fat, high carbohydrate”
diet
NTDR, 2012
Starved ewes and fat lambs: results
164 NMR spectra• Repeats at 2 and 6 months
NTDR, 2012
Starved ewes and fat lambs: results
Principal Component Analysis (PCA)• Data reduction, keep the variance• Display the relationships between samples
NTDR, 2012
Starved ewes and fat lambs: results
Age: 2 months, adopting to ruminant digestion• Separation depending on diet, / • Some samples ahead, separated from
NTDR, 2012
Data handling: procedures and terms
From FID’s to one table
NTDR, 2012
Data handling: procedures and terms
Keep track of your samples and data!• Enter title or label for each sample
FID’s to spectra:• Window function, Fourier transform,
phasing, base line adjustmentMake spectra comparable
• Calibration of ppm-scale• Project data on a common axis• Normalize
Compress data/simplify spectra• Integrate (binning, buckets)
Simplify calculations/interpretation of models• Mean center• Scaling
NTDR, 2012
FID’s to spectra
Use the same processing parameters for all spectra!• Window function with parameters
• Exponential Multiplication with a line broadening factor of 1 Hz
• Number of data points in the final spectrum• 32768 data points/20 ppm/600 MHz = 2.7 data
points/Hz• Make sure the peaks are properly defined.
NTDR, 2012
FID’s to spectra
Adjust each spectrum individually• Phasing: Adjust only zeroth-order phase constant if
possible
NTDR, 2012
FID’s to spectra
Base line adjustment• Make sure the base line is represented in the spectrum (large SW)• Use a simple function (2nd or 3rd order polynomial)
NTDR, 2012
Calibration of ppm-scale
Select a reference peak• In all spectra• TMS, DSS or Residual solvent signal• Sharp, well resolved
• Global shift (error in lock position)• Local shift (day to day variation in lock)
NTDR, 2012
Project data on a common axis
Discrete data points in different spectra are not necessarily aligned• Normally a very small effect
NTDR, 2012
Project data on a common axis
4.624.644.66 1H [ppm]
Serum, CPMG, -Glc H-1 after calibration
NTDR, 2012
Project data on a common axis
4.64 1H [ppm]
Serum, CPMG, -Glc H-1 after calibration
NTDR, 2012
Project data on a common axis
4.64 1H [ppm]
Serum, CPMG, -Glc H-1 after calibration
NTDR, 2012
Normalization
Make data directly comparable with each other• by removing known variation• by reducing unknown variation• row-wise operation (for each sample)
Variation caused by• different amounts/concentrations/volumes• instrument settings (tuning/matching, gain)
Variation expressed as• additive effects (base line)• multiplicative effects
Context dependent processing!• urine, serum, juice, …• depending on the type of samples, sampling schemes
and sample preprocessing
NTDR, 2012
Some Normalization schemes
Normalize to • constant sum• constant squared sum• highest signal
Find a common constant feature in the spectra• internal standard• invariant metabolite
• e.g. urinary creatinine/body weight
NTDR, 2012
Normalization
0 1 2 3 4 5 6 7 8 9 100
0.5
1
1.5Before normalization to constant sum
0 1 2 3 4 5 6 7 8 9 100
0.02
0.04After normalization to constant sum
Be pragmatic – if it works, it’s probably ok!• But make sure the sampling and analysis parameters are
kept constantSome normalization schemes will introduce new correlations
• Normalize to constant sum = if one signal increases, others are decreased
Binning
Binning = Bucketing = Integration of spectral ranges
Reduce data set• typical spectra: 65536 data points (64k)• binned data ~200 data points
Remove variability of chemical shifts• temperature• pH• concentration• overall composition of samples (salt, proteins,…)
Reduce effects of differences in shimming• the area of a peak is a more robust measure than
intensity value of each point
NTDR, 2012
NTDR, 2012
Binning
Integration into smaller ranges• Bucketing or binning• Start with equidistant ranges, ~0.01-0.05 ppm• Combine vicinal buckets with a high degree of co-variation
NTDR, 2012
Mean/median centering
Removes (subtracts) the mean/median value of each variable
Operates on the columns of the data matrix (for each variable/bucket)
Centering of the data gives more stable numerical solutions for the PCA (and other transformations). • If not used – the first pc will be the mean spectrum…
Use median centering for a more robust centering• less sensitive to outliers
NTDR, 2012
Centering
Raw data, before centering
1.16 ppm 1.12 ppm 1.08 ppm 1.04 ppm 1.01 ppm 0.98 ppm 0.94 ppm
0
0.01
0.02
0.03
0.04
0.05
0.06
Val
ues
NTDR, 2012
Centering
Mean centered
1.16 ppm 1.12 ppm 1.08 ppm 1.04 ppm 1.01 ppm 0.98 ppm 0.94 ppm
-8
-6
-4
-2
0
2
4
6
8
10
x 10-3
Val
ues
1.16 ppm 1.12 ppm 1.08 ppm 1.04 ppm 1.01 ppm 0.98 ppm 0.94 ppm
0
0.01
0.02
0.03
0.04
0.05
0.06
Val
ues
NTDR, 2012
Centering
Median centered
1.16 ppm 1.12 ppm 1.08 ppm 1.04 ppm 1.01 ppm 0.98 ppm 0.94 ppm
-8
-6
-4
-2
0
2
4
6
8
10
12
x 10-3
Val
ues
1.16 ppm 1.12 ppm 1.08 ppm 1.04 ppm 1.01 ppm 0.98 ppm 0.94 ppm
0
0.01
0.02
0.03
0.04
0.05
0.06
Val
ues
NTDR, 2012
Scaling
Scaling sets the weighting (importance) of each variable in the models
For NMR-spectroscopic data• the largest signals have the highest variance• small signals have low variance• noise have lowest variance
12345670
0.05
0.1
0.15
0.2
1H [ppm]
Sta
ndar
d de
viat
ion
28 Serum CPMG-spectra (AFB)
NTDR, 2012
Auto scaling
Auto scaling (variables divided by standard deviation, variance set to 1).
1.16 ppm 1.12 ppm 1.08 ppm 1.04 ppm 1.01 ppm 0.98 ppm 0.94 ppm
-2
-1
0
1
2
3
Val
ues
1.16 ppm 1.12 ppm 1.08 ppm 1.04 ppm 1.01 ppm 0.98 ppm 0.94 ppm
0
0.01
0.02
0.03
0.04
0.05
0.06
Val
ues
NTDR, 2012
Pareto scaling
Pareto scaling (variables divided by the square root of the standard deviation).
1.16 ppm 1.12 ppm 1.08 ppm 1.04 ppm 1.01 ppm 0.98 ppm 0.94 ppm
-0.1
-0.05
0
0.05
0.1
0.15
0.2
Val
ues
1.16 ppm 1.12 ppm 1.08 ppm 1.04 ppm 1.01 ppm 0.98 ppm 0.94 ppm
0
0.01
0.02
0.03
0.04
0.05
0.06
Val
ues
NTDR, 2012
Log transform
The data range is ’compressed’ by calculation of the logarithmic values before centering.
1.16 ppm 1.12 ppm 1.08 ppm 1.04 ppm 1.01 ppm 0.98 ppm 0.94 ppm
-3
-2
-1
0
1
2
3
4
5
x 10-3
Val
ues
1.16 ppm 1.12 ppm 1.08 ppm 1.04 ppm 1.01 ppm 0.98 ppm 0.94 ppm
0
0.01
0.02
0.03
0.04
0.05
0.06
Val
ues
NTDR, 2012
Scaling
Centering• Removes the offset in the
data• Highlights the differences
within each variableAuto scaling
• Sets the variance of each variable to unity.
• Inflates the noise.• All signals equally
important.Pareto scaling
• Reduce relative importance of large values.
• Scaling effect between no scaling (only centering) and auto scaling.
i
ijijij
i
ijijij
ijijij
sxx
x
sxx
x
xxx
~
~
~
NTDR, 2012
PCA
Principal Component Analysis (PCA)• Calculate scores and loadings• Data reduction (from 65000 data points to two…)• Keep the variance, don’t show the noise• Display the relationships between samples
X (systematic + random variation)S
Loadings
NTDR, 2012
PCA
Principal Component Analysis (PCA)• Calculate scores and loadings• Data reduction (from 65000 data points to two…)• Keep the variance, don’t show the noise• Display the relationships between samples
X (systematic variation)S
Loadings
E (random variation)
NTDR, 2012
PCA
Principal Component Analysis (PCA)• Calculate scores and loadings• Data reduction (from 65000 data points to two…)• Keep the variance, don’t show the noise• Display the relationships between samples