Rapid Understanding of Nucleotide variant Effect Software
(RUNES)
Overview
The Children's Mercy variant characterization pipeline
(RUNES: Rapid Understanding of Nucleotide variant Effect Software)
is a multi-stage analysis pipeline for annotating and classifying
human nucleotide variation detected through short read alignment.
The long-term plan is to continually add characterizations and
variant types as new tools/capabilities appear.
Characterization of all variants detected through CPGM
sequencing projects are stored in the CMH Variant Warehouse
relational database. Variant characterizations are viewable and
queryable through a lightweight web application.
Characterization stages
Characterization is divided into multiple independent
stages that each record zero or more annotations for each variant
according to the type of characterization being performed by the
stage. At the end of characterization, variant annotations are
aggregated and all submitted to a variant classifier which assigns
an American College of Medical Genetics (ACMG) category to each
based on the accumulated annotation evidence, with the most
damaging category achieved being the final categorization.
Characterization stages use a variety of software and data from
both internal and external sources. The stages include:
- ENSEMBL Variant Effect Predictor (VEP)
- Comparison with dbSNP
- CMH splice impact evaluator
- CMH transcript context characterizer
- Comparison with Human Gene Mutation Database
(HGMD/GenomeTrax)
ACMG Classification
Variant classification is the final stage of variant
characterization and consists assigning an interpretive category
representing clinical significance to each variant. Every variant
will receive a classification. RUNES uses categories recommended by
the American College of Medical Geneticists2 - these are listed
along with the criteria used for including a variant in each
category:
| Category |
Description |
Criteria |
| 1 |
Previously reported, recognized cause of the
disorder |
HGMD variant type of 'Disease Mutant' dbSNP Snp
Clinical Significance of 'pathogenic' |
| 2 |
Novel, of a type expected to cause the
disorder |
loss of initiation
premature stop codon
disruption of stop codon
whole transcript deletion
frameshifting in/del
disruption of splicing through deletion causing CDS/intron
fusion
overlap with splice donor or acceptor sites. |
| 3 |
Novel, may or may not be causal |
non-synonymous substitution
in-frame in/del
disruption of polypyrimidine tract
overlap with 5' exonic, 5 ' flank or 3' exonic splice
contexts |
| 4 |
Novel, probably not causal of disease |
all variants not in categories 1 - 3
synonymous AA changes
overlap with 5' intronic or 3' flank splice contexts pyrimidine
substitutions in polypyrimidine tract, other intronic
variants
dbSNP GMAF of greater than 0.02 |
| 5 |
Known neutral variant |
not used |
| 6 |
Not known/expected to cause of disease but
associated with a clinical presentation |
not used |
The ACMG categories rely heavily on the identification of novel
vs. known variants which implies comparison to external variation
databases. RUNES currently uses HGMD and dbSNP to fulfill this
role, though the current state of available databases limits their
utility. These existing databases are incomplete (do not contain
many variants), or can contain misannotations (incorrect
identification of variant) or mis-associations (association of
common polymorphisms to disease) (Bell et. al).
One effect of this is that the initial version of RUNES is
unable to categorize any variants as Category 5 or Category 6,
meaning that most novel variants without clear pathogenicity will
end up as Category 4. It is expected that as these existing
resources improve or as additional clinical grade databases become
available this categorization will be updated to include these
categories.
Minor Allele Frequency
The Variant Warehouse records a CMH Minor Allele Frequency
for all variants observed through CPGM sequencing projects. This
frequency value simply records the number of samples that have each
variant in them along with the total number of samples sequenced to
date. These values are recalculated for every variant in the
Variant Warehouse after the completion of each RUNES run so that
the value properly records the presence of absence of each variant
across every sample represented in the database.
Symptom and Sign Assisted Genome Analysis (SSAGA)
The mapping of the clinical features of a childs' likely
genetic disease to likely candidate genes for targeted analysis of
material variants is difficult, since there are over 3,500 genetic
disorders for which the causal gene is known and thousands of
clinical features. Matching of clinical features to diseases to
disease genes is performed by entry of terms describing the
patients presentations into a novel clinico-pathologic correlation
tool (SSAGA, Symptom and Sign Assisted Genome Analysis). It was
designed to enable physicians to delimit genome analyses to genes
of causal relevance to individual clinical presentations, in accord
with published guidelines for genetic testing in children and with
NGS. SSAGA currently has a menu of 227 clinical terms, arranged in
9 categories. SNOMED-CT terms map to 591 well-established recessive
diseases with known causal genes. Phenotype-to-disease-to-gene
mapping was informed by Gene Reviews, Online Mendelian Inheritance
in Man (OMIM) Clinical Synopsis, Mitocarta and expert physician
reviewers. Upon entry of the features of an individual patient,
SSAGA nominates the corresponding superset of relevant diseases and
genes, rank ordered by number of matching terms. It also contains a
freeform text box that allows physicians to enter findings for
which no SNOMED term exists, clinical term qualifiers, relevant
family history, and specific genes of interest. The diagnostic
sensitivity of SSAGA improves with use, by manual updating of
mappings in cases where nominations failed to include the causal
gene. SSAGA is extensible to additional diseases, genes, and
clinical terms. Interpretation of results is guided by the ranking
of variant reports yielded by RUNES on SSAGA-prioritized candidate
genes.
Variant Integration and Knowledge Interpretation in
Genomes (VIKING)
VIKING is a software tool that integrates a patient's
nucleotide variant calls with a patient's symptoms as entered into
SSAGA as well as the characterization of each variant from RUNES.
VIKING combines these data sources to present a prioritized,
filtered list of candidate variants to be reviewed by an expert
clinician who can then make a molecular diagnosis. VIKING uses the
original test order entered by a physician into SSAGA to determine
what genes are relevant to the patient's symptoms and masks all
variant results that are not material. VIKING then sorts the genes
according to how many clinical terms they matched, while ranking
the individual variants within each gene by its ACMG category. The
end result is that VIKING is able to quickly turn the full list of
detected variants for a patient, numbering in the thousands, into a
short list of variants most likely to be relevant to diagnosis.
In addition to the integration with SSAGA, VIKING offers
clinicians and researchers a variety of options for displaying and
filtering variant data, including filtering by minor allele
frequency and the detection of compound heterozygote genes.