cafe variome: connecting diagnostic networks, disease consortia and diverse third parties - raymond...
DESCRIPTION
The Cafe Variome approach changes the nature of the problem, by converting it to the challenge of enabling fully open and comprehensive “data discovery” (i.e., making the existence rather than the substance of the data openly accessible), for example, between networks of diagnostic laboratories or disease consortia that know/trust each other and share an interest in certain causative genes or diseases. This provides a mechanism for the discovery of rare sequence variants or patients with rare disease phenotypes. Cafe Variome is not a database, but instead aims to be a “shop window” for openly searching/discovering what data exist. The system allows users to openly search the full content of the data, in sophisticated ways, and thereby determine whether or not a record of interest exists in an information resource. Users of the system can subsequently access the hit data (according to pre-set permissions) in line with one of the following conditions: Open Access: the user may access variant and patient records directly and freely Linked Access: the user can view summary data and is provided with a link to the source database to access the full record Restricted Access: the user may access variant and patient records if they belong to a pre-approved group or must request access from the data owner to the full record Cafe Variome offers a complete data sharing software solution (either a hosted or an “in-a-box” solution) all controlled by an intuitive administrator dashboard, which gives owners complete control over their data and installation. Dashboard configuration options include a content management system for adding/editing custom pages and menus, full control over site appearance (logo, colours, backgrounds, themes). Easy import of source data via templates, control over how searches are performed and results are displayed (ordering and specifying of fields and which fields can be searched) and a comprehensive access-control system for users and groups. A sophisticated “Google-like” query interface allows users to form complex queries to interrogate and discover data across an installation. Additionally, multiple installations can be connected together to form federated networks to allow controlled queries across nodes within the network. Each variant in the system can be annotated with any number of terms from any of the NCBO BioPortal phenotype ontologies. This flexibility allows a variant to be associated with a single disease term, or a complex combination of phenotype descriptions. An admin tool generates an up-to-date searchable term tree for all ontologies used in the annotations. This functionality makes use of the BioPortal API to ensure the latest version of all ontologies, and associated terms, are available to the user.TRANSCRIPT
Openly share the ‘existence’ rather than the ‘substance’ of the data…thereafter variably manage data access
Connecting Diagnostic Networks
• Need to enable disease consortia to identify patients with similar phenotypes or to identify patients harbouring the same variant(s)
• Currently not possible due to difficulties of data sharing between labs, or with central repositories
• Cafe Variome can solve this...Simple to install and can be deployed either– on a server at one or more of a network of labs– or, hosted by the Cafe Variome team
The Cafe Variome Solution
• Allows 'open discovery' of the existence (rather than actual substance) of relevant data
• Thereby, enables networks of labs to easily query for the existence of patients or variants, without necessarily revealing additional underlying data, thus overcoming issues of patient confidentiality & data ‘ownership'
• Currently being extended to support more sophisticated omics/NGS data handling and deep phenotype data
Cafe Variome Features
• Cafe Variome is not a database but is a searchable 'menu'
• The platform enables data owners/submitters to specify and update lists of who can search for records of interest (using various search parameters)
• Results can be returned to users:- as open data- as links to data at source- by computationally facilitating data-access requests
- Allows users to check whether the same variants(s) /patients (with related phenotypes) have previously been seen by other laboratories
Networks of labs exchanging data
Optional wider
discovery
Clinical Community
Research Community
CENTRAL
optional
- Supports multiple installs & federated searches(data remains at source)
Data Sharing Models (facilitated & controlled access)
Open Access
Core info for each record is shown & made available for download
Restricted Access
Core or full record details are provided per record, if:• User is pre-approved by
group-access permissions set by data owner
• Access is approved after facilitated email request to the data owner
Open Discovery – Reporting Existence of Patients/Variants in Sources
Linked Access
No data, only link to the data source is reported
Source DBresource
Access then control managed by source db
Record Discovery “Menu”
Google-likesearch queriesAND/OR, fuzzy,boosting, etc.
A count of hits in each data source is returned and grouped by the sharing policy
Cafe Variome Variant Report
Data Sharing Granularity
Data owners can control access to variants from individual record level to entire data sets
Administrator’s Interface
Create Custom Groups of Labs
Assign Groups to Variant Sources
Users belonging to groups have pre-approved access to particular variant and patient data
• Make data import as flexible as possible• Allows users to generate import templates
– Excel or tab-delimited– Specify which data fields– Populate with their data– Import into CV
Bulk Data Import Templates
Phenotype Developments
• Allow the phenotypic consequences of genetic variants to be described using public ontologies– Many terms from many ontologies can be associated
with one variant or patient• Also, allow the phenotypic consequences of genetic
variants to be described using a local vocabulary or list
Enable hierarchical viewing and querying of the phenotype ontology data
Built on standards
• Cafe Variome is based on open-source software• HVP Recommended System Status (RSS):
– HGVS nomenclature (RSS001)– Mutalyzer (RSS002)– LOVD (RSS003)– VarioML (RSS004: under review)– Locus Reference Genomic (RSS005)– VariO (RSS006: under review)
• Submitted to HVP for RSS review: May 2014
Summary
• CV is very flexible in terms of the content that it can hold– gross disease/phenotype name or single variant– or, detailed phenotype and thousands of variants– (whole exome/genome scan, in next release)
• Each data source decides what data fields are included– which of these are made discoverable & by whom– which fields are shared if discovery searches hit a record– deeper data sharing may be permitted to particular users
• The API (computer-computer interface) is straightforward, and so other data systems can easily be modified to 'talk to' Cafe Variome installations
• We can host a Cafe Variome for you, or you can run it locally:– one Cafe Variome for the whole project– one per site and federate these to act as a private network– in all cases any number of different users can be given tailored
access rights for discovery and data sharing
• It is simple to populate the system– from various starting formats (we can help you with this)– this can be done automatically and at your preferred interval, if
you have data in other databases
• Key point — it is flexible, and designed to let the data find the data, without compromising patient privacy or researcher/clinician control and ownership of the data
Acknowledgements
• Anthony J Brookes• Owen Lancaster ([email protected])• Tim Beck• Raymond Dalgleish• The research leading to these results has
received funding from the European Community’s Seventh Framework Programme (FP7/2007-2013) under grant agreement number 200754 — the GEN2PHEN project