New DNA Reference Map May Aid in Identifying SMN1 Mutations
A research team in the U.S. has developed a new DNA reference, or benchmark, map with detailed sequence information on a number of genes, including SMN1 — the disease-causing gene in spinal muscular atrophy (SMA). Such information was missing from previous benchmark maps due to the genes’ intrinsic features.
“Some of these genes, which have previously been very difficult to access, are suspected to have some connection to disease,” Justin Zook, PhD, a study co-senior authors and a biomedical engineer at the National Institute of Standards and Technology (NIST), said in a NIST press release.
“Others have very clear clinical importance,” such as SMN1, “a gene we characterized that is directly associated with spinal muscular atrophy,” Zook added.
This improved reference map may help labs and clinics to sequence these genes more accurately from a person’s sample, which is critical to identify those with relevant mutations that may cause disease.
Its development was detailed in the study, “Curated variation benchmarks for challenging medically relevant autosomal genes,” published in the journal Nature Biotechnology.
DNA sequencing technologies typically read out small fractions of DNA, and then attempt to place them together correctly, similar to a puzzle. Reference genomes (all the genetic information of an organism) are nearly full genomes, stitched together from several people’s DNA.
Given the high level of genetic similarities between humans, sequencing a person’s genome is a matter of laying out the pieces of genetic information based on where they match up with the reference sample.
These benchmarks have led to the discovery of new gene variants, “enabling highly accurate clinical genome sequencing and advancing our detection and understanding of the impact of many genomic variations on human disease at scale,” the researchers wrote.
While 80% to 90% of the millions of genetic changes that are present in the human genome have already been characterized, those that remain still pose a challenge to current sequencing technology.
Some of these variants are located in medically relevant genes that are often repetitive, have many natural variations among people, and/or are in hard-to-assess regions.
As such, in the clinical setting, DNA tests for these genes often need targeted designs and can involve multiple technologies, and “are only applied when suspicion of a specific disorder is high,” the researchers added.
The Genome in a Bottle (GIAB) consortium is a NIST-hosted collaborative effort aimed at improving DNA sequencing technologies and making them practical for clinical use.
The consortium, involving NIST researchers and colleagues at Baylor College of Medicine and DNAnexus, now applied a cutting-edge method called haplotype-resolved whole-genome assembly to characterize 273 of 395 challenging genes excluded from previous benchmarks.
They focused on a particular human genome sample, named HG002, whose donor consented to publicizing their genetic code through the Personal Genome Project. The new benchmark was carefully compiled so it complements former GIAB benchmarks.
A key element was high fidelity, or HiFi, sequencing, which can sequence longer stretches of DNA: tens of thousands of nucleotides — the building blocks of DNA — at a time versus a hundred nucleotides with conventional methods.
“Instead of having a thousand-piece puzzle, where you have these little, tiny pieces that you have to put together, it’s more like having a hundred-piece puzzle where you have bigger pieces that you can put together,” Zook said.
Researchers specifically combined HiFi with hifiasm, a state-of-the-art software tool that allows both copies of each gene (one from the mother and one from the father) to be read.
Previous methods sequenced a combination of both copies, increasing the chances of creating errors and missing important details unique to each gene copy.
The consortium’s approach, which also took advantage of the strengths of previously established sequencing methods, allowed scientists to decode the sequences of more than 20,000 genetic variants across the 273 genes and with higher accuracy than could be achieved by using a single method.
The sequence of the SMN1 gene, whose loss of exon 7 in both gene copies is the most common cause of SMA, was also fully resolved with this approach and included in the new benchmark. Exons are the sections of a gene that contain the information to generate proteins.
Of note, rarer cases of SMA are caused by single-nucleotide changes in the SMN1 gene. As such, having an accurate benchmark for SMN1 enables better detection of these disease-causing mutations.
In addition to SMA, the team characterized variants in genes connected to other health conditions, such as heart disease, diabetes, and celiac disease.
To the researchers’ surprise, their method also identified errors in previously developed HG002 reference genomes within several medically relevant genes. Some highly similar genes in these reference genomes were found to be false duplications, which could cause sequencing methods to misread genes linked to serious disorders, Zook said.
With the comprehensive characterization of these genes, the scientists proposed corrections to the reference genomes they used.
The new benchmark is now publicly available for labs to use. Interested researchers or clinicians would first need to sequence HG002 samples, which can be accessed through the NIST Office of Reference Materials, and then check their results against the benchmark.
“Forming benchmarks from a haplotype-resolved whole-genome assembly may become a prototype for future benchmarks covering the whole genome,” the researchers wrote.
This may help improve clinical diagnoses, provide a better understanding of the heritability of multiple diseases, and potentially help in developing treatments.