in-memory data management for systems medicine

22
In-Memory Data Management for Systems Medicine Dr. Matthieu-P. Schapranow e:Med Focus Workshop Data Management in Systems Medicine, Berlin June 10, 2016

Upload: matthieu-schapranow

Post on 09-Jan-2017

434 views

Category:

Health & Medicine


1 download

TRANSCRIPT

Page 1: In-Memory Data Management for Systems Medicine

In-Memory Data Management for Systems Medicine

Dr. Matthieu-P. Schapranow e:Med Focus Workshop Data Management in Systems Medicine, Berlin

June 10, 2016

Page 2: In-Memory Data Management for Systems Medicine

Heart Failure

Sleeping disorder

Fibrosis

Blood pressure

Blood volume

Gene ex-pression

Hyper-trophy Calcium

meta-bolism

Energy meta-bolism

Iron deficiency

Vitamin-D deficiency

Gender

Epi-genetics

■  Integrated systems medicine based on real-time analysis of healthcare data

■  Initial funding period: Mar ‘15 – Feb ‘18

■  Funded consortium partners:

App Example: Systems Medicine Model of Heart Failure (SMART)

Schapranow, e:Med Workshop, Jun 10, 2016

In-Memory Data Management for Systems Medicine

2

Page 3: In-Memory Data Management for Systems Medicine

■  Patients

□  Individual anamnesis, family history, and background

□  Require fast access to individualized therapy

■  Clinicians

□  Identify root and extent of disease using laboratory tests

□  Evaluate therapy alternatives, adapt existing therapy

■  Researchers

□  Conduct laboratory work, e.g. analyze patient samples

□  Create new research findings and come-up with treatment alternatives

Actors in Systems Medicine

Schapranow, e:Med Workshop, Jun 10, 2016

3

In-Memory Data Management for Systems Medicine

Page 4: In-Memory Data Management for Systems Medicine

Schapranow, e:Med Workshop, Jun 10, 2016

In-Memory Data Management for Systems Medicine

4

Page 5: In-Memory Data Management for Systems Medicine

Schapranow, e:Med Workshop, Jun 10, 2016

In-Memory Data Management for Systems Medicine

5

Page 6: In-Memory Data Management for Systems Medicine

IT Challenges Distributed Heterogeneous Data Sources

6

Human genome/biological data 600GB per full genome 15PB+ in databases of leading institutes

Prescription data 1.5B records from 10,000 doctors and 10M Patients (100 GB)

Clinical trials Currently more than 30k recruiting on ClinicalTrials.gov

Human proteome 160M data points (2.4GB) per sample >3TB raw proteome data in ProteomicsDB

PubMed database >23M articles

Hospital information systems Often more than 50GB

Medical sensor data Scan of a single organ in 1s creates 10GB of raw data Cancer patient records

>160k records at NCT In-Memory Data Management for Systems Medicine

Schapranow, e:Med Workshop, Jun 10, 2016

Page 7: In-Memory Data Management for Systems Medicine

Our Methodology Design Thinking

Schapranow, e:Med Workshop, Jun 10, 2016

In-Memory Data Management for Systems Medicine

7

Page 8: In-Memory Data Management for Systems Medicine

■  Joint process definition

■  Identification of long running steps

■  Aims

□  Improved communication

□  Sharing of data

□  Reproducible data processing

Requirements Engineering for System Medicine Computer-aided Systems Medicine Process

Schapranow, e:Med Workshop, Jun 10, 2016

In-Memory Data Management for Systems Medicine

8

20160407_eCardiohealth_Whole_Process

Hea

rt C

ente

rSt

udy

Ass

esso

r

Study Assessor

StudyAssessment

Eligible Patient Available

Radi

olog

ist Radiologist

MRI MRImages

Patient MetaData, Hemo-

dynamicParameters,and Clinical

Data

Card

iolo

gist CardiologistSurgery Performed?

Hemodyna-mic

Evaluation

Surg

eon

Surgeon

Surgery

IT p

latf

orm

IT platform

UpdateNotification

SMART Data Storage

Dataprocessing

Wet

Lab

Wet

Lab

Wet Lab

Wet LabExperiments Validation

Wet LabResults, e.g.Expression

DataMessage: Biopsy Sample

Condition: 20 Biopsy Samples for batch processing

Bioi

nfor

mat

ici-

an

Bioinformatician

RNASequencing FASTQ Files

Prot

eom

ics

Lab

Prot

eom

eA

naly

zer

Proteome Analyzer

ProteinExpressions

ProteomeExperiments

Card

iom

yocy

teM

odel

er

Cardiomyocyte Modeler

CardiomyocyteModeling

Cardiomyo-cyte Electro-mechanical

Model

Mod

elin

gM

ulti

-sca

lem

odel

ler

Multi-scale modeller

Message: Post-surgery visit completed with data entry

Multi-ScaleModeling

Modeloutput

HemodynamicParameters

ProteinExpression

Levels

Page 9: In-Memory Data Management for Systems Medicine

Data Processing Pipelines From Model to Execution 1.  Design time (researcher, process expert)

□  Definition of parameterized process model

□  Uses graphical editor and jobs from repository

2.  Configuration time (researcher, lab assistant)

□  Select model and specify parameters, e.g. aln opts

□  Results in model instance stored in repository

3.  Execution time (researcher)

□  Select model instance

□  Specify execution parameters, e.g. input files

In-Memory Data Management for Systems Medicine

Schapranow, e:Med Workshop, Jun 10, 2016

9

Page 10: In-Memory Data Management for Systems Medicine

■  Requirements

□  Real-time data analysis

□  Maintained software

■  Restrictions

□  Data privacy

□  Data locality

□  Volume of “big medical data”

■  Solution?

□  Federated In-Memory Database System vs. Cloud Computing

Software Requirements in Systems Medicine

Schapranow, e:Med Workshop, Jun 10, 2016

In-Memory Data Management for Systems Medicine

10

Page 11: In-Memory Data Management for Systems Medicine

Where are all those Clouds go to?

Schapranow, e:Med Workshop, Jun 10, 2016

In-Memory Data Management for Systems Medicine

11

Gartner's 2014 Hype Cycle for Emerging Technologies

Page 12: In-Memory Data Management for Systems Medicine

Multiple Cloud Service Providers

Schapranow, e:Med Workshop, Jun 10, 2016

In-Memory Data Management for Systems Medicine

12

Local S ystem

C loudSynchron ization

S erv ice

R

Loca l S to rage

LocalSynchron iza tion

S erv ice

R

SharedC loud

S torage

S ite A

Local S ystem

R

Loca l S to rage

LocalSynchron iza tion

Serv ice

S ite B

C loudSynchron iza tion

S erv ice

SharedC loud

S torage

R

C loud P rovider S ite A

C loud Provider S ite B

Page 13: In-Memory Data Management for Systems Medicine

A Single Service Provider

Schapranow, e:Med Workshop, Jun 10, 2016

In-Memory Data Management for Systems Medicine

13

CloudSynchron ization

Service

SharedC loud

Storage

Site A Site BC loud Provider

C loud SystemR R

Page 14: In-Memory Data Management for Systems Medicine

Multiple Sites Forming the Federated In-Memory Database System

Schapranow, e:Med Workshop, Jun 10, 2016

In-Memory Data Management for Systems Medicine

14

Federated In-M em ory D atabase System

M aster D ata andS hared A lgorithm s

S ite A S ite BC loud Provider

C loud IM D BInstance

Local IM D BInstance

S ensitive D ata,e.g . P atient D ata

R

Local IM D BInstance

Sensitive D ata,e .g. P atien t D ata

R

Page 15: In-Memory Data Management for Systems Medicine

Schapranow, e:Med Workshop, Jun 10, 2016

we.analyzegenomes.com Real-time Analysis of Big Medical Data

15

In-Memory Database

Extensions for Life Sciences

Data Exchange, App Store

Access Control, Data Protection

Fair Use

Statistical Tools

Real-time Analysis

App-spanning User Profiles

Combined and Linked Data

Genome Data

Cellular Pathways

Genome Metadata

Research Publications

Pipeline and Analysis Models

Drugs and Interactions

In-Memory Data Management for Systems Medicine

Drug Response Analysis

Pathway Topology Analysis

Medical Knowledge Cockpit Oncolyzer

Clinical Trial Recruitment

Cohort Analysis

...

Indexed Sources

Page 16: In-Memory Data Management for Systems Medicine

Combined column and row store

Map/Reduce Single and multi-tenancy

Lightweight compression

Insert only for time travel

Real-time replication

Working on integers

SQL interface on columns and rows

Active/passive data store

Minimal projections

Group key Reduction of software layers

Dynamic multi-threading

Bulk load of data

Object-relational mapping

Text retrieval and extraction engine

No aggregate tables

Data partitioning Any attribute as index

No disk

On-the-fly extensibility

Analytics on historical data

Multi-core/ parallelization

Our Technology In-Memory Database Technology

+

+++

+

P

v

+++t

SQL

xx

T

disk

16

Schapranow, e:Med Workshop, Jun 10, 2016

In-Memory Data Management for Systems Medicine

Page 17: In-Memory Data Management for Systems Medicine

■  Traditional databases allow four data operations:

□  INSERT, SELECT and

□  DELETE, UPDATE

■  Insert-only requires only INSERT, SELECT to maintain a complete history (bookkeeping systems)

■  Insert-only enables time travelling, e.g. to

□  Trace changes and reconstruct decisions

□  Document complete history of changes, therapies, etc.

□  Enable statistical observations

Insert-Only / Append-Only

Schapranow, e:Med Workshop, Jun 10, 2016

In-Memory Data Management for Systems Medicine

17

+++

+

Page 18: In-Memory Data Management for Systems Medicine

■  Main memory access is the new bottleneck

■  Lightweight compression can reduce this bottleneck, i.e.

□  Lossless

□  Improved usage of data bus capacity

□  Work directly on compressed data

Lightweight Compression

Schapranow, e:Med Workshop, Jun 10, 2016

In-Memory Data Management for Systems Medicine

18

Attribute Vector

RecId ValueId 1  C18.0 2  C32.0 3  C00.9

4  C18.0 5 C20.0 6 C20.0 7 C50.9

8 C18.0

Inverted Index

ValueId RecIdList 1  2 2  3 3  5,6

4  1,4,8 5  7

Data Dictionary

ValueId Value 1 Larynx 2 Lip 3 Rectum

4 Colon 5 Mama Table

… … … C18.0 Colon 646470 C50.9 Mama 167898 C20.0 Rectum 647912 C20.0 Rectum 215678 C18.0 Colon 998711 C00.9 Lip 123489 C32.0 Larynx 357982 C18.0 Colon 091487

RecId 1 RecId 2 RecId 3 RecId 4 RecId 5 RecId 6 RecId 7 RecId 8 …

•  Typical compression factor of 10:1 for enterprise software

•  In financial applications up to 50:1

Page 19: In-Memory Data Management for Systems Medicine

■  Horizontal Partitioning

□  Cut long tables into shorter segments

□  E.g. to group samples with same relevance

■  Vertical Partitioning

□  Split off columns to individual resources

□  E.g. to separate personalized data from experiment data

■  Partitioning is the basis for

□  Parallel execution of database queries

□  Implementation of data aging and data retention management

Data Partitioning

Schapranow, e:Med Workshop, Jun 10, 2016

In-Memory Data Management for Systems Medicine

19

Page 20: In-Memory Data Management for Systems Medicine

■  Modern server systems consist of x CPUs, e.g.

■  Each CPU consists of y CPU cores, e.g. 12

■  Consider each of the x*y CPU core as individual workers, e.g. 6x12

■  Each worker can perform one task at the same time in parallel

■  Full table scan of database table w/ 1M entries results in 1/x*1/y search time when traversing in parallel

□  Reduced response time

□  No need for pre-aggregated totals and redundant data

□  Improved usage of hardware

□  Instant analysis of data

Multi-core and Parallelization

Schapranow, e:Med Workshop, Jun 10, 2016

In-Memory Data Management for Systems Medicine

20

Page 21: In-Memory Data Management for Systems Medicine

■  Online: Visit we.analyzegenomes.com for latest research results, slides, videos, tools, and publications

■  Offline: Read more about it, e.g. High-Performance In-Memory Genome Data Analysis: How In-Memory Database Technology Accelerates Personalized Medicine, In-Memory Data Management Research, Springer,

ISBN: 978-3-319-03034-0, 2014

■  In Person: Join us for the Symposium “Diagnostics in the Era of Big Data and Systems Medicine” Oct 5-6, 2016 in Potsdam

Where to find additional information?

Schapranow, e:Med Workshop, Jun 10, 2016

In-Memory Data Management for Systems Medicine

21

Page 22: In-Memory Data Management for Systems Medicine

Keep in contact with us!

Dr. Matthieu-P. Schapranow Program Manager E-Health & Life Sciences

Hasso Plattner Institute

August-Bebel-Str. 88 14482 Potsdam, Germany

[email protected]

http://we.analyzegenomes.com/

Schapranow, e:Med Workshop, Jun 10, 2016

In-Memory Data Management for Systems Medicine

22