10% human and machine learning
TRANSCRIPT
![Page 1: 10% Human and Machine Learning](https://reader035.vdocuments.pub/reader035/viewer/2022081520/58e7b39c1a28abbb4e8b5071/html5/thumbnails/1.jpg)
Ulcerous colitis, irritable bowel syndrome and
microbiota
Can machine learning help forecasting out of poop?
Bojic Svetlana, R user, MDUniversity of Belgrade
![Page 2: 10% Human and Machine Learning](https://reader035.vdocuments.pub/reader035/viewer/2022081520/58e7b39c1a28abbb4e8b5071/html5/thumbnails/2.jpg)
Human microbiota
1 human cellfor every 10 microbial cells
• Are we only humans (Homo sapiens)?
1 human geneFor every 100-1000 microbial genes
……if there was democracy in our bodies…
![Page 3: 10% Human and Machine Learning](https://reader035.vdocuments.pub/reader035/viewer/2022081520/58e7b39c1a28abbb4e8b5071/html5/thumbnails/3.jpg)
Gut microbiota-”forgotten organ”• Synthesize and excrete vitamins, transform non-
digestible carbohydrates, produce SCFA that feed colonocytes
• Prevent colonization by pathogens-”defending the territory”
• May antagonize other bacteria• Educate the immune system
![Page 4: 10% Human and Machine Learning](https://reader035.vdocuments.pub/reader035/viewer/2022081520/58e7b39c1a28abbb4e8b5071/html5/thumbnails/4.jpg)
How can we tell who is in there?• Each bacteria needs to produce
proteins- hence has ribosomes (and ribosomal DNA encoded in its genome)
• While large parts of 16S ribosomal DNA are “conserved”, V1 and V6 are highly specific for particular bacterial strain
• V1 and V6 regions (24-72bp) of all known intestinal bacteria were selected as basis for the probe design
• 3699 unique probes were printed on the microarray slide• We have culture independent tool to assess and quantify the
presence of ~1000 bacterial strains simultaneously
![Page 5: 10% Human and Machine Learning](https://reader035.vdocuments.pub/reader035/viewer/2022081520/58e7b39c1a28abbb4e8b5071/html5/thumbnails/5.jpg)
HITChip based experimentm icrobia l
com m unity D N A
R N AN ucle ic acids
extraction & labelling
Data analys is- P rofiling
- Identification- Q uantifica tion
• ….You end up with data frame, the intensity of the signal from the given probe per sample
• One could combine those with phylogenetic map, to get the abundancy of particular bacteria on genus or phylum level
![Page 6: 10% Human and Machine Learning](https://reader035.vdocuments.pub/reader035/viewer/2022081520/58e7b39c1a28abbb4e8b5071/html5/thumbnails/6.jpg)
Our problem• Ulcerative colitis (UC)
is a chronic, or long lasting disease that causes inflammation- irritation or swelling- and ulcers on the inner lining of the large intestine
• Irritable bowel syndrome (IBS) is a group of symptoms – including pain or discomfort in your abdomen and changes in your bowel movement patterns- that occur together.
• It’s a functional gastrointestinal disorder.
Endoscopies of the large intestine are the most accurate methods for diagnosing ulcerative colitis
We need less invasive alternative!
![Page 7: 10% Human and Machine Learning](https://reader035.vdocuments.pub/reader035/viewer/2022081520/58e7b39c1a28abbb4e8b5071/html5/thumbnails/7.jpg)
calprotectin • One large (n=2499) meta analysis gave pooled estimate of sensitivity and
specificity for calprotectin (0.88, 0.73) for assassment of endoscopically defined disease activity in UC. (Mosli et al, 2015)
• marker of neutrophilic intestinal inflammation
• On the other hand, Clostridium sphenoides and Hemophilus strongly correlate with calprotectin levels!
Kolho et. al. 2015
![Page 8: 10% Human and Machine Learning](https://reader035.vdocuments.pub/reader035/viewer/2022081520/58e7b39c1a28abbb4e8b5071/html5/thumbnails/8.jpg)
Our data : 150 patients X 3699 features
• Preliminary constrained RDA analysis on phylum level confirmed that there is significant effect of health status on microbial composition (p<0.01)
• The health status alone could explain as much as 10.2% of variability, even when the influence of ProjectID and gender was partialed out.
![Page 9: 10% Human and Machine Learning](https://reader035.vdocuments.pub/reader035/viewer/2022081520/58e7b39c1a28abbb4e8b5071/html5/thumbnails/9.jpg)
Supervised machine learning procedures
• The most successful model – from elastic net family (sensitivity=0.5, specificity=0.98 for UC class) utilized only 89 unique probes- ..and all made biological sense!
* Most common error was mistaking diarrhea predominant IBS subtype for UC and vice versa
• Why we love R : package ‘caret’ has a number of models implemented, we had preference for feature-selection algorithms
![Page 10: 10% Human and Machine Learning](https://reader035.vdocuments.pub/reader035/viewer/2022081520/58e7b39c1a28abbb4e8b5071/html5/thumbnails/10.jpg)
Yet, situation complicates further…
..which means that one out of 10 labels in our training set could be wrong!
![Page 11: 10% Human and Machine Learning](https://reader035.vdocuments.pub/reader035/viewer/2022081520/58e7b39c1a28abbb4e8b5071/html5/thumbnails/11.jpg)
And that calls for semi-supervised methods…
• These are the methods that make use of both labeled and unlabeled data to train the model
• We are currently experimenting with upclass package, and are in the correspondence with MDs
• …+ implementing the learning algorithms of our own