clinvar: a central repository for clinically relevant variants - melissa j landrum
DESCRIPTION
Thousands of new variants are being identified thanks to advances in sequencing technologies. However, much of the data are stored in separate and sometimes private databases and so may be difficult to use to evaluate the clinical significance of variants, especially rare variants. To improve access to this type of data, ClinVar maintains a freely available, public archive of human variation and its relationship to disease. The data can be used interactively on the web; a monthly full release in XML format and weekly summary files of genes and variants are also available for incorporation into analysis pipelines. Submissions include variants identified by direct testing in clinical or research labs, as well as reviewed variant-phenotype relationships from expert groups, such as InSiGHT and CFTR2, and professional societies, such as ACMG. In addition to the variant and phenotype, individual submissions may also provide a clinical assertion and evidence for that interpretation. The data model is flexible for many data elements, such that a variant may be defined by sequence or cytogenetic nomenclature; the phenotype may be a diagnostic term or features of a disease; and evidence for the interpretation may be structured as counts or provided as free text. For submitters who maintain their own website for variants, such as LSDBs, ClinVar links to the submitter’s site for each submitted variant, allowing users who start at ClinVar an awareness of the LSDB’s curated variants and access to more information on the variant that may be available at the LSDB. Each individual submission is accessioned and versioned, in the format SCV000000000.1, to allow the submitter to update their record as the interpretation of the variant is re-evaluated over time. ClinVar uses standard terminologies, such as those for variant nomenclature, phenotypes, and pathogenicity, to avoid data ambiguity and to promote comparison of information from multiple sources. ClinVar also adds related variant data, such as allele frequencies and HGVS expressions mapped across molecule types. While ClinVar staff members provide some curation of variants and phenotypes represented in ClinVar, clinical significance values are provided by submitters. As part of the submission process, ClinVar provides feedback to submitters. This feedback includes invalid HGVS expressions and submissions that conflict in clinical significance with an existing record for the same variant and phenotype which may warrant further curation. Submissions for the same variant-phenotype pair from different submitters are aggregated into a record that is accessioned and versioned in the format RCV000000000.1. Aggregation allows ClinVar to indicate when multiple submitters agree or conflict in the clinical interpretation of the variant, which can help clinical labs and curation groups to identify high-confidence interpretations as well as those that should be prioritized for curation efforts.TRANSCRIPT
ClinVar: A Central Repository for
Interpretations of Clinically Relevant Variants
Melissa LandrumHVP 2014
May 21, 2014
ClinVarwww.ncbi.nlm.nih.gov/clinvar/
ClinVar statswww.ncbi.nlm.nih.gov/clinvar/submitters/
Variation Phenotype
Interpretation Evidence
ClinVar integrates four domains of information
dbSNPdbVar
Gene
MedGen(HPO, OMIM)
PubMedACMG
Sequence Ontology
GTR
ClinVar – Standardized data607008.0001985A>G985A>G (K304E)985A>G (K329E)A985GACADM, LYS304GLUK304EK304E (985 A->G)K304E (K329E)K304E onlyK329EK329E(985A>G)LYS304GLUMutation c.985A>G (p.K304E)c.985A>Gc.985A>G (p.K304E)c.985A>G (p.Lys304Gluc985A>Gincludes: K304E (985A>G)p.K304Ep.Lys329Glupreviously known as p.Lys329GluAnalysis of ACADM 985A>G mutation
NC_000001.10:g.76226846A>G
NG_007045.1:g.41804A>G
NM_000016.4:c.985A>G
ACADM:c.985A>G
NP_000007.1:p.Lys329Glu
ClinVar aggregates by variant and phenotype
VariantPhenotypeSubmitter
SCV – submitted ClinVar record
FBN1:c.4786C>TMarfan syndrome
Lab ASCV000000010
FBN1:c.4786C>TMarfan syndrome
Lab BSCV000000020
Variant Phenotype
FBN1:c.4786C>TMarfan syndrome
RCV000000050
RCV – reference ClinVar record
Allele summary• Gene• Variant type• Genomic location• HGVS expressions*• Molecular consequence*• Links*• Frequency*
Phenotype summary• Names• Links*• Age of onset *• Prevalence *
Interpretation• Significance• Review status *• Accession.version *
* May be provided by NCBI
ClinVar web display
ClinVar web display
ClinVar web display
classified by single submitterclassified by multiple submittersconflicting data from submittersreviewed by expert panelreviewed by professional society
ClinVar Review Status
Expert panels – both medical and research experts with published criteria and process for evaluating variant pathogenicity
• CFTR2, InSiGHT
Professional society – groups that provide practice guidelines
• American College of Medical Genetics (ACMG)
ClinVar aggregates by variant
VariantPhenotypeSubmitter
PTPN11:c.205G>CNoonan syndrome
Lab ASCV000000010
PTPN11:c.205G>CNoonan syndrome
Lab BSCV000000020
Variant Phenotype
PTPN11:c.205G>CNoonan syndrome
RCV000000050
PTPN11:c.205G>CRasopathy
RCV000000050
PTPN11:c.205G>CVariant
PTPN11:c.205G>CRasopathy
Lab CSCV000000030
ClinVar – new web display
Accessing ClinVar data• Interactively on the web, updated weekly• Monthly full releases– Comprehensive XML extraction– VCF files– Tab-delimited summary files for genes, variants
• E-utilities as web service or via command line• Annotation on graphic sequence displays• Variation Viewer
www.ncbi.nlm.nih.gov/variation/view/
• Variation Reporter www.ncbi.nlm.nih.gov/variation/tools/reporter
Submitting data to ClinVar• Minimal or data-rich submissions are accepted• Multiple submission formats– Excel spreadsheet templates– tsv, csv files– XML
• Online documentationhttp://www.ncbi.nlm.nih.gov/clinvar/docs/submit/
And contact us with questions - [email protected]
AcknowledgementsClinVar/GTR/RefSeqGene/Gene/MedGen staff
dbSNP/dbVar/dbGaP
Alex AstashynChao Chen Shanmuga ChitipirallaBaoshan GuDouglas Hoffman Wonhee Jang Brandi KattmanKen KatzJennifer Lee Donna Maglott Adriana Malheiro Michael Ovetsky George Riley Wendy Rubinstein Amanjeev Sethi Ray Tully Ricardo Villamarin
Michael Feolo John GarnerTim HefferonBrad HolmesJohn LopezRama MaitiJose MenaLon PhanDavid ShaoMing Ward
All of NCBIJim OstellSteve Sherry