Review Article Volume 10 Issue 2
1Unigem, Scientific Director, Colombia
2Unigem, Laboratory Coordinator, Colombia
3Unigem, Colombia
4Unigem, Director, Colombia
Correspondence: Beatriz H Aristizabal B, UNIGEM, Cll 19 A N° 44-25 cons 2301, Edificio Salud y Servicios, Ciudad del Rio, Medellín, Antioquia, Colombia, Tel 5745401100
Received: February 05, 2018 | Published: March 6, 2018
Citation: Beatriz HAB, Olga LRC, Claudia PBH, et al. A complete look with a magnifying glass of genomic and exome sequencing: what are we receiving in our clinical reports?. J Anesth Crit Care Open Access. 2018;10(2):52–58. DOI: 10.15406/jaccoa.2018.10.00358
The objective of this study was to review the ability of laboratory’s exome-sequencing test to detect known and novel sequence variants and identify the critical factors influencing the interpretation of a clinical exome test. Methods: There was select the guidelines papers and followed the validation strategy, essential considerations for sequencing analysis and how to detect known variants, interpretation-based approach that assessed relative ability to identify and interpret disease-causing variants, by analyzing and comparing the results and all informs. The study showed a detailed approach to exome analysis and reports. Conclusion: The analysis provide an assessment of critical areas that influence interpretation of an exome test, including comprehensive phenotype capture, assessment of clinical overlap, availability of parental data, and the addressing of limitations in database updates. This review can be used to inform improvements in phenotype driven interpretation of medical exomes in clinical and research.
NGS, Genome, Exome, guidelines
DNA, Deoxyribonucleic acid; NGS, Next generation sequencing; QC, Base call quality score; HGVS, Human genome variation society; NCBI, National Center for Biotechnology Information; LRG, Locus Reference Genomic; UTR, untranslated regions; ACMG, American College of Medical Genetics and Genomics; GUS, genes of uncertain significance; CAP, American College of Pathology; CCD, Centers for Disease Control and Prevention
The exome is the part of the genome formed by the DNA fragments (exons) that are transcribed to give rise to the proteins. The study of exome is one of the most complete and complex ways to study our DNA. The exons are the coding regions that will provide the information for the synthesis of a protein, while the introns are non-coding regions, which are interspersed in the gene and have other functions.1 Clinical laboratories are rapidly implementing next generation sequencing (NGS)-based tests for the diagnosis of genetic disorders. While targeted, NGS-based gene panels are highly tuned to genes within a specific disease parameter, whole-exome sequencing (WES) tests assess a broad range of known and presumed phenotypes and genotypes.
The human exome consists of approximately 180,000 exons that make up about 1% of the total genome (about 30 megabases of DNA). The complete exomic sequencing covers all the coding regions of the genome. It comprises 95% of the mutations related to diseases.2,3 The coverage and cost of exomic sequencing is ideal because it is more complete than the sequencing of specific genes or panels and therefore proves to be cost effective and timely.4
For disorders that require the detection of germline variants in heterozygosity, as well as somatic mutations in tumors and detection of heteroplasmic mitochondrial variants, the exomic sequencing must comply with international standards and ideally cover more than 200X. In addition, confirmation strategies by Sanger are not necessary.4
When should sequencing be ordered?
Complete exomic sequencing should be ordered to confirm or exclude clinical diagnoses in complex diseases, more than three clinical suspicions, degenerative and neurodegenerative diseases, in chronic diseases with undetermined diagnosis, rare and orphan diseases and in cancer or family history of cancer.
Genomic sequencing identifies circulating tumor DNA in blood and allows an early identification of the cancer in addition to a pharmacogenomic study and identification of the response to treatment. Among the 20000 genes in the genome there are hundreds that can cause unwanted cell division, genetic diseases and cancer. The exomic sequencing of 4900 genes directly related to disease makes it possible to quickly identify which genes are altered and their specific mutations.5
Use of WES as a diagnostic test changed the testing strategy from focusing on few genes known to cause a disorder or phenotype to sequencing all genes in the genome and focusing the analysis on those groups of genes that may directly explain the individual’s phenotype (Figure 1).6 Each laboratory should make a validation process for the NGS and establish the sensibility and specificity of the process.7,8
Figure 1 8
Essential considerations for sequencing analysis
According to the guidelines of the American College of Medical Genetics and Genomics,9 and the Association for Molecular Pathology,10 the following requirements must be fulfilled in order for a study to be qualified and valid:8,11-13
Metrics The report must contain the essential elements including structural results, interpretation, references, methodology used and the appropriate renouncements of the test. The elements of the report are emphasized in the regulations of the CAP and CLIA standards for next-generation sequencing. The metrics used in the study for analytical validation must be reported in the report.
Base call quality score, Q score. The Q score is a base call probability scale incorrectly and is inversely proportional to the probability that a single base in the sequence is correct. For example, a T with QC of 30 is considered probably correct with a confidence of P 0.001. Any base with a QC <20 should be considered low quality and any identified variant should be considered a false positive (Table 1).
Q Score |
Incorrect base call probability |
Q40 |
1 in 10000 |
Q30 |
1 in 1000 |
Q20 |
1 in 100 |
Q10 |
1 in 10 |
Table 1 Base call quality score
Reading depth
The depth of reading or coverage is conventionally considered as a number followed by "x". It is the number of independent readings with the alignment at a locus of interest. It is frequently expressed as an average or percentage. For example, the clinical report must say the percentage of coverage of the test: 150X with 95-98% coverage
Low coverage, less than 70X, is at risk of losing variants and assigning incorrect allelic zygosity states, and decreasing the ability to filter artifacts. Laboratories should have a minimum coverage to detect variants based on diagnostic approaches that guarantee an adequate analytical performance for the report. The detection of germline heterozygous variants must have a minimum coverage of 100X for the proband. A coverage greater than 100X is required for the detection of mixed variants or in mosaicism (samples of somatic tumor, mitochondrial heteroplasmy or germinal mosaicism).
Variant reads
It is the number of sequences read that support the presence of a variant. Due to the NGS error rate, the call level of less than 5 readings per variant is considered a false positive.
Terminology: The identification of the variants and mutations must follow the classification of the five characteristics: pathogenic, probably pathogenic, benign, probably benign and variant of uncertain significance.
Nomenclature: It must follow the naming standard guide of the variants designation (http://www.hgvs.org/mutnomen) and the version of the Human Genome Variation Society (HGVS) and (https: // mutalyzer .nl).14
Clinical reports should include the reference sequence and avoid ambiguous naming of variants at the DNA level as well as providing the coding region and nomenclature of the protein to assist functional interpretations (example: "g." Genomic sequence, "c. "DNA coding sequence," p. "For mitochondrial protein," m. ", Etc.). The reference sequence must be complete and using the NCBI RefSeq database (http://www.ncbi.nlm.nih.gov/RefSeq/) with the version or number of the Locus Reference Genomic (LRG) database (http: // www.lrg-sequence.org).
The genomic coordinates should be used and defined according to the standard genome (e.g. hg19) or to the reference genomic sequence that covers the entire gene (including the 5 'and 3' untranslated regions (UTRs) and promoters).
A reference transcript should be used for each gene and provide a report of when the variants are described, frequently using LRG10, CCDS Database, Human Gene Mutation Database (http://www.hgmd.cf.ac.uk), ClinVar ( http://www.ncbi.nlm.nih.gov/clinvar) or locus-specific database.
The genomic coordinates should be used and defined according to the standard genome (e.g. hg19) or to the reference genomic sequence that covers the entire gene (including the 5 'and 3' untranslated regions (UTRs) and promoters).
A reference transcript should be used for each gene and provide a report of when the variants are described, frequently using LRG10, CCDS Database, Human Gene Mutation Database (http://www.hgmd.cf.ac.uk), ClinVar (http://www.ncbi.nlm.nih.gov/clinvar) or locus-specific database.
However, laboratories must evaluate the clinical impact of the variant and its transcripts to be interpreted clinically. Not all types of variants (eg complex variants) are covered by the recommendations of HGVS, the ACMG recommends three exceptions from the rules of the HGVS nomenclature:15,16
3.1) "X" is considered acceptable for use in reporting non-sense variants "*" and "Ter";
3.2) The exon number where the variant is reported is recommended; and
3.3) The pathogenic term is recommended instead of the term of affected function.
Literature and databases: The population frequency of the variant must be taken into account before reporting it (Table 2. Databases).
Population data base |
|
Exome Aggregation Consortium |
Desease data base |
|
ClinVar |
Sequencing data base |
|
NCBI Genome |
Table 2 Databases of the sequencing15
Data storage and traceability of patient reports
NGS generates a massive amount of data files with differing information contents and sizes. Laboratories should make explicit in their policies which files will be retained. We recommend that the laboratory consider a minimum of 2-year storage of a file type that would allow regeneration of the primary results as well as reanalysis with improved analytic pipelines (e.g., bam or fastq files with all reads retained). In addition, reinterpretation of variant significance may be done every year in order to verify the VUS variants.
Computational analysis and prediction programs (In Silico):
There is a variety of tools available for in silico analysis. The algorithms used may differ among themselves in the effect of the variant, the nucleotide sequence and amino acid levels including the effect in the variant or in the protein. Two categories include if the change is missense the result of the function of the protein may be damaged or the structure and predict its effect (Table 3).
Prediction | Name |
website |
Missense prediction |
ConSurf |
http://bental.tau.ac.il/new_ConSurfDB/ |
FATHMM |
http://fathmm.biocompute.org.uk/fathmmMKL.htm |
|
Mutation Assesor |
http://www.ngrl.org.uk/Manchester/page/missense-prediction-tools |
|
PANTHER |
http://pantherdb.org/data/ |
|
PhD-SNP |
http://snps.biofold.org/phd-snp/phd-snp.html |
|
SIFT |
http://sift.jcvi.org/ |
|
SNPs&GO |
http://snps.biofold.org/snps-and-go/snps-and-go.html |
|
AlignGVGD |
http://p53.iarc.fr/AGVGDMethod.aspx |
|
Mutation Taster |
http://www.mutationtaster.org/ |
|
PolyPhen 2 |
http://genetics.bwh.harvard.edu/pph2/ |
|
Condel |
https://omictools.com/consensus-deleteriousness-score-of-missense-snvs-tool |
|
|
CAAD |
http://cadd.gs.washington.edu/ |
|
nsSNPAnalyzer |
https://omictools.com/nssnpanalyzer-tool |
|
Provean |
http://provean.jcvi.org/index.php |
Splice site prediction |
GeneSplicer |
http://www.cbcb.umd.edu/software/GeneSplicer/gene_spl.shtml |
Human Splicing Finder |
http://www.umd.be/HSF3/ |
Table 3 In silico predictive algorithms12
The impact of the effect depends on the evolutionary conservation of the protein or the biochemical consequences. In general, most algorithms predict the relationship with the disease with a sensitivity of 65-80%. Among them PolyPhen, SIFT and Mutation Taster.
The bioinformatic analysis of NGS designed to convert signals into data, interpret information and turn it into clinical application, is conceptualized as primary, secondary and tertiary analysis, as can be seen in Figure 2.
Proposed criteria for interpretation of variants
The following approach evaluates the evidence of primary Mendelian inheritance variants; its use is not for somatic, pharmacogenomic variations or associated with complex multigene phenomena. Variations of uncertain significance must be taken into special consideration and followed up ("genes of uncertain significance", GUS) to identify new genes in the disease.
The interpretation for the pathogenicity dertermination is independent of being the cause or not of the disease. There are two criteria for classifying pathogenicity or probable pathogenicity (Table 4) and Benign or probably benign variants (Table 5).13
The criteria are combined according to the scoring rules as shown in Table 6.13 Where flexibility is provided to the classification of the variant.
Very strong |
|
PVS1 |
Null variant (nonsense, frameshift, canonical ±1 or 2 splice sites, initiation codon, single or multiexon deletion) in a gene where LOF is a known mechanism of disease. |
Strong |
|
PS1 |
Same amino acid change as a previously established pathogenic variant regardless of nucleotide change |
PS2 |
De novo (both maternity and paternity confirmed) in a patient with the disease and no family history |
PS3 |
Well-established in vitro or in vivo functional studies supportive of a damaging effect on the gene or gene product |
PS4 |
The prevalence of the variant in affected individuals is significantly increased compared with the prevalence in controls |
Moderate |
|
PM1 |
Located in a mutational hot spot and/or critical and well-established functional domain (e.g., active site of an enzyme) without benign variation. |
PM2 |
Absent from controls (or at extremely low frequency if recessive) (Table 6) in Exome Sequencing Project, 1000 Genomes Project, or Exome Aggregation Consortium |
PM3 |
For recessive disorders, detected in trans with a pathogenic variant |
PM4 |
Protein length changes as a result of in-frame deletions/insertions in a nonrepeat region or stop-loss variants |
PM5 |
Novel missense change at an amino acid residue where a different missense change determined to be pathogenic has been seen before Example: Arg156His is pathogenic; now you observe Arg156Cys Caveat: Beware of changes that impact splicing rather than at the amino acid/protein level. |
PM6 |
Assumed de novo, but without confirmation of paternity and maternity |
Supporting |
|
PP1 |
Co segregation with disease in multiple affected family members in a gene definitively known to cause the disease Note: May be used as stronger evidence with increasing segregation data |
PP2 |
Missense variant in a gene that has a low rate of benign missense variation and in which missense variants are a common mechanism of disease |
PP3 |
Multiple lines of computational evidence support a deleterious effect on the gene or gene product (conservation, evolutionary, splicing impact, etc.) |
PP4 |
Patient’s phenotype or family history is highly specific for a disease with a single genetic etiology |
PP5 |
Reputable source recently reports variant as pathogenic, but the evidence is not available to the laboratory to perform an independent evaluation |
Table 4 Classification criteria of pathogenic variants13
Stand Alone |
|
BA1 |
Allele frequency is >5% in Exome Sequencing Project, 1000 Genomes Project, or Exome Aggregation Consortium |
Strong evidence of benign impact |
|
BS1 |
Allele frequency is greater than expected for disorder |
BS2 |
Observed in a healthy adult individual for a recessive (homozygous), dominant (heterozygous), or X-linked (hemizygous) disorder, with full penetrance expected at an early age |
BS3 |
Well-established in vitro or in vivo functional studies show no damaging effect on protein function or splicing |
BS4 |
Lack of segregation in affected members of a family. |
Supporting |
|
BP1 |
Missense variant in a gene for which primarily truncating variants are known to cause disease |
BP2 |
Observed in trans with a pathogenic variant for a fully penetrant dominantgene/disorder; or observed in cis with a pathogenic variant in any inheritance pattern |
BP3 |
In-frame deletions/insertions in a repetitive region without a known function |
BP4 |
Multiple lines of computational evidence suggest no impact on gene or gene product (conservation, evolutionary, splicing impact, etc) |
Caveat: As many in silico algorithms use the same or very similar input for their predictions, each algorithm cannot be counted as an independent criterion. BP4 can be used only once in any evaluation of a variant. |
|
BP5 |
Variant found in a case with an alternate molecular basis for disease |
BP6 |
Reputable source recently reports variant as benign but the evidence is not available to the laboratory to perform an independent evaluation |
BP7 |
A synonymous (silent) variant for which splicing prediction algorithms predict no impact to the splice consensus sequence nor the creation of a new splice site AND the nucleotide is not highly conserved |
Table 5 Criteria for classifying benign variants13
Pathogenic
|
Probably pathogenic
|
Benign
|
Probably Benign
|
Table 6 Rules for combining criteria to classify sequence variants13
complete genomic and exomic sequencing is the ideal diagnostic test for complex diseases, it is cost-effective, and it requires an analysis with high quality parameters and applying diagnostic algorithms and tools to transform the data into valid clinical application.
Professional societies such as the American College of Pathology (CAP) and the US Centers for Disease Control and Prevention (CCD) and American College of Medical Genetics and Genomics (ACMG), as well as the Association of Molecular Pathology have established the regulations and guidelines for the realization of the test with high quality standards and for the analysis of data that must be fulfilled in all the institutions that carry out this type of studies.
The use of these tests allows for a specific treatment with greater benefit for the population.17 Knowing one's own genetic information helps guide the distribution of economic resources according to the genomic profiles of health and disease.18.
In Colombia, this type of studies is carried out with very good quality in few accredited laboratories where the quality parameters applied to each study are shown, the metrics used and also the complete bioinformatics that is available to the clinician for future analyzes. It is important to consider that population genomic studies are key to being able to find correct actions and medicines according to the ethnic group.
The authors thank American College of Pathology (CAP), the US Centers for Disease Control and Prevention (CCD) and American College of Medical Genetics and Genomics (ACMG), as well as the Association of Molecular Pathology for constructing the model and guidelines for this next generation sequencing technology.
The authors certify that they have NO affiliations with or involvement in any organization or entity with any financial interest in the subject matter or materials discussed in this manuscript.
©2018 Beatriz, et al. This is an open access article distributed under the terms of the, which permits unrestricted use, distribution, and build upon your work non-commercially.