in-memory data management for systems medicine
TRANSCRIPT
In-Memory Data Management for Systems Medicine
Dr. Matthieu-P. Schapranow e:Med Focus Workshop Data Management in Systems Medicine, Berlin
June 10, 2016
Heart Failure
Sleeping disorder
Fibrosis
Blood pressure
Blood volume
Gene ex-pression
Hyper-trophy Calcium
meta-bolism
Energy meta-bolism
Iron deficiency
Vitamin-D deficiency
Gender
Epi-genetics
■ Integrated systems medicine based on real-time analysis of healthcare data
■ Initial funding period: Mar ‘15 – Feb ‘18
■ Funded consortium partners:
App Example: Systems Medicine Model of Heart Failure (SMART)
Schapranow, e:Med Workshop, Jun 10, 2016
In-Memory Data Management for Systems Medicine
2
■ Patients
□ Individual anamnesis, family history, and background
□ Require fast access to individualized therapy
■ Clinicians
□ Identify root and extent of disease using laboratory tests
□ Evaluate therapy alternatives, adapt existing therapy
■ Researchers
□ Conduct laboratory work, e.g. analyze patient samples
□ Create new research findings and come-up with treatment alternatives
Actors in Systems Medicine
Schapranow, e:Med Workshop, Jun 10, 2016
3
In-Memory Data Management for Systems Medicine
Schapranow, e:Med Workshop, Jun 10, 2016
In-Memory Data Management for Systems Medicine
4
Schapranow, e:Med Workshop, Jun 10, 2016
In-Memory Data Management for Systems Medicine
5
IT Challenges Distributed Heterogeneous Data Sources
6
Human genome/biological data 600GB per full genome 15PB+ in databases of leading institutes
Prescription data 1.5B records from 10,000 doctors and 10M Patients (100 GB)
Clinical trials Currently more than 30k recruiting on ClinicalTrials.gov
Human proteome 160M data points (2.4GB) per sample >3TB raw proteome data in ProteomicsDB
PubMed database >23M articles
Hospital information systems Often more than 50GB
Medical sensor data Scan of a single organ in 1s creates 10GB of raw data Cancer patient records
>160k records at NCT In-Memory Data Management for Systems Medicine
Schapranow, e:Med Workshop, Jun 10, 2016
Our Methodology Design Thinking
Schapranow, e:Med Workshop, Jun 10, 2016
In-Memory Data Management for Systems Medicine
7
■ Joint process definition
■ Identification of long running steps
■ Aims
□ Improved communication
□ Sharing of data
□ Reproducible data processing
Requirements Engineering for System Medicine Computer-aided Systems Medicine Process
Schapranow, e:Med Workshop, Jun 10, 2016
In-Memory Data Management for Systems Medicine
8
20160407_eCardiohealth_Whole_Process
Hea
rt C
ente
rSt
udy
Ass
esso
r
Study Assessor
StudyAssessment
Eligible Patient Available
Radi
olog
ist Radiologist
MRI MRImages
Patient MetaData, Hemo-
dynamicParameters,and Clinical
Data
Card
iolo
gist CardiologistSurgery Performed?
Hemodyna-mic
Evaluation
Surg
eon
Surgeon
Surgery
IT p
latf
orm
IT platform
UpdateNotification
SMART Data Storage
Dataprocessing
Wet
Lab
Wet
Lab
Wet Lab
Wet LabExperiments Validation
Wet LabResults, e.g.Expression
DataMessage: Biopsy Sample
Condition: 20 Biopsy Samples for batch processing
Bioi
nfor
mat
ici-
an
Bioinformatician
RNASequencing FASTQ Files
Prot
eom
ics
Lab
Prot
eom
eA
naly
zer
Proteome Analyzer
ProteinExpressions
ProteomeExperiments
Card
iom
yocy
teM
odel
er
Cardiomyocyte Modeler
CardiomyocyteModeling
Cardiomyo-cyte Electro-mechanical
Model
Mod
elin
gM
ulti
-sca
lem
odel
ler
Multi-scale modeller
Message: Post-surgery visit completed with data entry
Multi-ScaleModeling
Modeloutput
HemodynamicParameters
ProteinExpression
Levels
Data Processing Pipelines From Model to Execution 1. Design time (researcher, process expert)
□ Definition of parameterized process model
□ Uses graphical editor and jobs from repository
2. Configuration time (researcher, lab assistant)
□ Select model and specify parameters, e.g. aln opts
□ Results in model instance stored in repository
3. Execution time (researcher)
□ Select model instance
□ Specify execution parameters, e.g. input files
In-Memory Data Management for Systems Medicine
Schapranow, e:Med Workshop, Jun 10, 2016
9
■ Requirements
□ Real-time data analysis
□ Maintained software
■ Restrictions
□ Data privacy
□ Data locality
□ Volume of “big medical data”
■ Solution?
□ Federated In-Memory Database System vs. Cloud Computing
Software Requirements in Systems Medicine
Schapranow, e:Med Workshop, Jun 10, 2016
In-Memory Data Management for Systems Medicine
10
Where are all those Clouds go to?
Schapranow, e:Med Workshop, Jun 10, 2016
In-Memory Data Management for Systems Medicine
11
Gartner's 2014 Hype Cycle for Emerging Technologies
Multiple Cloud Service Providers
Schapranow, e:Med Workshop, Jun 10, 2016
In-Memory Data Management for Systems Medicine
12
Local S ystem
C loudSynchron ization
S erv ice
R
Loca l S to rage
LocalSynchron iza tion
S erv ice
R
SharedC loud
S torage
S ite A
Local S ystem
R
Loca l S to rage
LocalSynchron iza tion
Serv ice
S ite B
C loudSynchron iza tion
S erv ice
SharedC loud
S torage
R
C loud P rovider S ite A
C loud Provider S ite B
A Single Service Provider
Schapranow, e:Med Workshop, Jun 10, 2016
In-Memory Data Management for Systems Medicine
13
CloudSynchron ization
Service
SharedC loud
Storage
Site A Site BC loud Provider
C loud SystemR R
Multiple Sites Forming the Federated In-Memory Database System
Schapranow, e:Med Workshop, Jun 10, 2016
In-Memory Data Management for Systems Medicine
14
Federated In-M em ory D atabase System
M aster D ata andS hared A lgorithm s
S ite A S ite BC loud Provider
C loud IM D BInstance
Local IM D BInstance
S ensitive D ata,e.g . P atient D ata
R
Local IM D BInstance
Sensitive D ata,e .g. P atien t D ata
R
Schapranow, e:Med Workshop, Jun 10, 2016
we.analyzegenomes.com Real-time Analysis of Big Medical Data
15
In-Memory Database
Extensions for Life Sciences
Data Exchange, App Store
Access Control, Data Protection
Fair Use
Statistical Tools
Real-time Analysis
App-spanning User Profiles
Combined and Linked Data
Genome Data
Cellular Pathways
Genome Metadata
Research Publications
Pipeline and Analysis Models
Drugs and Interactions
In-Memory Data Management for Systems Medicine
Drug Response Analysis
Pathway Topology Analysis
Medical Knowledge Cockpit Oncolyzer
Clinical Trial Recruitment
Cohort Analysis
...
Indexed Sources
Combined column and row store
Map/Reduce Single and multi-tenancy
Lightweight compression
Insert only for time travel
Real-time replication
Working on integers
SQL interface on columns and rows
Active/passive data store
Minimal projections
Group key Reduction of software layers
Dynamic multi-threading
Bulk load of data
Object-relational mapping
Text retrieval and extraction engine
No aggregate tables
Data partitioning Any attribute as index
No disk
On-the-fly extensibility
Analytics on historical data
Multi-core/ parallelization
Our Technology In-Memory Database Technology
+
+++
+
P
v
+++t
SQL
xx
T
disk
16
Schapranow, e:Med Workshop, Jun 10, 2016
In-Memory Data Management for Systems Medicine
■ Traditional databases allow four data operations:
□ INSERT, SELECT and
□ DELETE, UPDATE
■ Insert-only requires only INSERT, SELECT to maintain a complete history (bookkeeping systems)
■ Insert-only enables time travelling, e.g. to
□ Trace changes and reconstruct decisions
□ Document complete history of changes, therapies, etc.
□ Enable statistical observations
Insert-Only / Append-Only
Schapranow, e:Med Workshop, Jun 10, 2016
In-Memory Data Management for Systems Medicine
17
+++
+
■ Main memory access is the new bottleneck
■ Lightweight compression can reduce this bottleneck, i.e.
□ Lossless
□ Improved usage of data bus capacity
□ Work directly on compressed data
Lightweight Compression
Schapranow, e:Med Workshop, Jun 10, 2016
In-Memory Data Management for Systems Medicine
18
Attribute Vector
RecId ValueId 1 C18.0 2 C32.0 3 C00.9
4 C18.0 5 C20.0 6 C20.0 7 C50.9
8 C18.0
Inverted Index
ValueId RecIdList 1 2 2 3 3 5,6
4 1,4,8 5 7
Data Dictionary
ValueId Value 1 Larynx 2 Lip 3 Rectum
4 Colon 5 Mama Table
… … … C18.0 Colon 646470 C50.9 Mama 167898 C20.0 Rectum 647912 C20.0 Rectum 215678 C18.0 Colon 998711 C00.9 Lip 123489 C32.0 Larynx 357982 C18.0 Colon 091487
RecId 1 RecId 2 RecId 3 RecId 4 RecId 5 RecId 6 RecId 7 RecId 8 …
• Typical compression factor of 10:1 for enterprise software
• In financial applications up to 50:1
■ Horizontal Partitioning
□ Cut long tables into shorter segments
□ E.g. to group samples with same relevance
■ Vertical Partitioning
□ Split off columns to individual resources
□ E.g. to separate personalized data from experiment data
■ Partitioning is the basis for
□ Parallel execution of database queries
□ Implementation of data aging and data retention management
Data Partitioning
Schapranow, e:Med Workshop, Jun 10, 2016
In-Memory Data Management for Systems Medicine
19
■ Modern server systems consist of x CPUs, e.g.
■ Each CPU consists of y CPU cores, e.g. 12
■ Consider each of the x*y CPU core as individual workers, e.g. 6x12
■ Each worker can perform one task at the same time in parallel
■ Full table scan of database table w/ 1M entries results in 1/x*1/y search time when traversing in parallel
□ Reduced response time
□ No need for pre-aggregated totals and redundant data
□ Improved usage of hardware
□ Instant analysis of data
Multi-core and Parallelization
Schapranow, e:Med Workshop, Jun 10, 2016
In-Memory Data Management for Systems Medicine
20
■ Online: Visit we.analyzegenomes.com for latest research results, slides, videos, tools, and publications
■ Offline: Read more about it, e.g. High-Performance In-Memory Genome Data Analysis: How In-Memory Database Technology Accelerates Personalized Medicine, In-Memory Data Management Research, Springer,
ISBN: 978-3-319-03034-0, 2014
■ In Person: Join us for the Symposium “Diagnostics in the Era of Big Data and Systems Medicine” Oct 5-6, 2016 in Potsdam
Where to find additional information?
Schapranow, e:Med Workshop, Jun 10, 2016
In-Memory Data Management for Systems Medicine
21
Keep in contact with us!
Dr. Matthieu-P. Schapranow Program Manager E-Health & Life Sciences
Hasso Plattner Institute
August-Bebel-Str. 88 14482 Potsdam, Germany
http://we.analyzegenomes.com/
Schapranow, e:Med Workshop, Jun 10, 2016
In-Memory Data Management for Systems Medicine
22