Pediatric Genomic Medicine Software Tools
Pediatric Genomic Medicine Software Tools

Rapid Understanding of Nucleotide variant Effect Software (RUNES)


Overview

The Children's Mercy variant characterization pipeline (RUNES: Rapid Understanding of Nucleotide variant Effect Software) is a multi-stage analysis pipeline for annotating and classifying human nucleotide variation detected through short read alignment. The long-term plan is to continually add characterizations and variant types as new tools/capabilities appear.

Characterization of all variants detected through CPGM sequencing projects are stored in the CMH Variant Warehouse relational database. Variant characterizations are viewable and queryable through a lightweight web application. 

Characterization stages

Characterization is divided into multiple independent stages that each record zero or more annotations for each variant according to the type of characterization being performed by the stage. At the end of characterization, variant annotations are aggregated and all submitted to a variant classifier which assigns an American College of Medical Genetics (ACMG) category to each based on the accumulated annotation evidence, with the most damaging category achieved being the final categorization.

Characterization stages use a variety of software and data from both internal and external sources. The stages include: 

  • ENSEMBL Variant Effect Predictor (VEP)
  • Comparison with dbSNP
  • CMH splice impact evaluator
  • CMH transcript context characterizer
  • Comparison with Human Gene Mutation Database (HGMD/GenomeTrax)

ACMG Classification

Variant classification is the final stage of variant characterization and consists assigning an interpretive category representing clinical significance to each variant. Every variant will receive a classification. RUNES uses categories recommended by the American College of Medical Geneticists2 - these are listed along with the criteria used for including a variant in each category:

Category Description Criteria
1 Previously reported, recognized cause of the disorder HGMD variant type of 'Disease Mutant' dbSNP Snp Clinical Significance of 'pathogenic'
2 Novel, of a type expected to cause the disorder loss of initiation
premature stop codon
disruption of stop codon
whole transcript deletion
frameshifting in/del
disruption of splicing through deletion causing CDS/intron fusion
overlap with splice donor or acceptor sites.
3 Novel, may or may not be causal non-synonymous substitution
in-frame in/del
disruption of polypyrimidine tract
overlap with 5' exonic, 5 ' flank or 3' exonic splice contexts
4 Novel, probably not causal of disease all variants not in categories 1 - 3
synonymous AA changes
overlap with 5' intronic or 3' flank splice contexts pyrimidine substitutions in polypyrimidine tract, other intronic variants
dbSNP GMAF of greater than 0.02
5 Known neutral variant not used
6 Not known/expected to cause of disease but associated with a clinical presentation not used

The ACMG categories rely heavily on the identification of novel vs. known variants which implies comparison to external variation databases. RUNES currently uses HGMD and dbSNP to fulfill this role, though the current state of available databases limits their utility. These existing databases are incomplete (do not contain many variants), or can contain misannotations (incorrect identification of variant) or mis-associations (association of common polymorphisms to disease) (Bell et. al).

One effect of this is that the initial version of RUNES is unable to categorize any variants as Category 5 or Category 6, meaning that most novel variants without clear pathogenicity will end up as Category 4. It is expected that as these existing resources improve or as additional clinical grade databases become available this categorization will be updated to include these categories. 

Minor Allele Frequency

The Variant Warehouse records a CMH Minor Allele Frequency for all variants observed through CPGM sequencing projects. This frequency value simply records the number of samples that have each variant in them along with the total number of samples sequenced to date. These values are recalculated for every variant in the Variant Warehouse after the completion of each RUNES run so that the value properly records the presence of absence of each variant across every sample represented in the database. 

Symptom and Sign Assisted Genome Analysis (SSAGA)

The mapping of the clinical features of a childs' likely genetic disease to likely candidate genes for targeted analysis of material variants is difficult, since there are over 3,500 genetic disorders for which the causal gene is known and thousands of clinical features. Matching of clinical features to diseases to disease genes is performed by entry of terms describing the patients presentations into a novel clinico-pathologic correlation tool (SSAGA, Symptom and Sign Assisted Genome Analysis). It was designed to enable physicians to delimit genome analyses to genes of causal relevance to individual clinical presentations, in accord with published guidelines for genetic testing in children and with NGS. SSAGA currently has a menu of 227 clinical terms, arranged in 9 categories. SNOMED-CT terms map to 591 well-established recessive diseases with known causal genes. Phenotype-to-disease-to-gene mapping was informed by Gene Reviews, Online Mendelian Inheritance in Man (OMIM) Clinical Synopsis, Mitocarta and expert physician reviewers. Upon entry of the features of an individual patient, SSAGA nominates the corresponding superset of relevant diseases and genes, rank ordered by number of matching terms. It also contains a freeform text box that allows physicians to enter findings for which no SNOMED term exists, clinical term qualifiers, relevant family history, and specific genes of interest. The diagnostic sensitivity of SSAGA improves with use, by manual updating of mappings in cases where nominations failed to include the causal gene. SSAGA is extensible to additional diseases, genes, and clinical terms. Interpretation of results is guided by the ranking of variant reports yielded by RUNES on SSAGA-prioritized candidate genes.

Variant Integration and Knowledge Interpretation in Genomes (VIKING)

VIKING is a software tool that integrates a patient's nucleotide variant calls with a patient's symptoms as entered into SSAGA as well as the characterization of each variant from RUNES. VIKING combines these data sources to present a prioritized, filtered list of candidate variants to be reviewed by an expert clinician who can then make a molecular diagnosis. VIKING uses the original test order entered by a physician into SSAGA to determine what genes are relevant to the patient's symptoms and masks all variant results that are not material. VIKING then sorts the genes according to how many clinical terms they matched, while ranking the individual variants within each gene by its ACMG category. The end result is that VIKING is able to quickly turn the full list of detected variants for a patient, numbering in the thousands, into a short list of variants most likely to be relevant to diagnosis.

In addition to the integration with SSAGA, VIKING offers clinicians and researchers a variety of options for displaying and filtering variant data, including filtering by minor allele frequency and the detection of compound heterozygote genes.

Copyright © 1996-2013 The Children's Mercy Hospital