The human genome requires revision. But how can we ensure fairness?

January 29, 2023




The Human Genome Project's directors and then-US President Bill Clinton were seated side by side in June 2000 while grinning. In terms of genetics, he asserted that all people, regardless of ethnicity, are more or less 99.9% the same. When the first version of the human genome sequence was made public in the White House, that was the message that was conveyed.


The original human reference genome was finally formed from a single string of As, Ts, Cs, and Gs. The reference has revolutionised genome sequencing since its release in 2003 and assisted researchers in identifying hundreds of disease-causing mutations. The code that is supposed to represent the human species is mostly based on only one individual from Buffalo, New York, which poses an amusing challenge at its heart.

Despite the fact that all people are quite similar, Pui-Yan Kwok, an expert in genomic research located at the University of California, San Francisco and Academia Sinica in Taiwan, maintains that "one individual is not indicative of the whole." The majority of genome sequencing is therefore essentially skewed.


Due to this bias, only certain types of genetic variation may be identified, leaving other patients without diagnosis and sometimes without the right kind of care. In addition, those who have less in common with the Buffalo guy in terms of ancestry will presumably gain less from the approaching era of precision medicine, which promises to personalise treatment.

In order to counter this, scientists have begun to put together reference genomes for particular nations, such as South Korea, Japan, Sweden, Denmark, and the United Arab Emirates. They believe that doing this will benefit their populations, while detractors are concerned that it could make migrant patients feel inferior to locals in their healthcare systems. A massive new initiative called the human pangenome is now proposing a different approach in an effort to represent the world's diversity.

The term "precision medicine," often referred to as "personalised medicine," has long been popular in the medical industry and it unquestionably sounds fantastic. According to Neil Hanchard, a medical scientist of the US National Human Genome Research Institute, "getting the right drug to the right patient at the right time is the goal."

Standard genome sequencing, however, leaves out a lot of variation that could be related to illness. Typically, it involves fragmenting DNA into tiny pieces known as "short reads," sequencing them, and then assembling them into a genome using the reference as a map.

This makes it relatively simple to identify single nucleotide variants (SNVs), such as a change from a C to a T in a gene's code. However, structural variants (SVs), which are larger areas of variation, are more difficult to detect. Portions that are missing, inverted, or relocated can also go unnoticed, as can sections that are hundreds or thousands of base pairs long. Short readings cannot be easily mapped to the reference in certain circumstances, and "a big number," according to Kwok, are discarded.




This indicates a bias in favour of the SVs included in the reference for conventional genome sequencing. Your sequence won't accurately represent your unique variation if your SVs are different. This is terrible news because we expect that these subtle variations between individuals may help us understand why, for instance, one person may respond favourably to a drug while another does not.

Kwok's work gives a suggestion as to how many SVs go unnoticed. In 2019, his team examined samples from 154 individuals from around the world and discovered that the reference SV genome was missing 60 million base pairs of information, with much more still to be discovered. Nearly 130,000 additional sequences were discovered in a 338-person follow-up that simply looked for extra inserted DNA.


But it also seems that SVs exhibit varied patterns of frequency in various populations. According to Kwok, this means that when a person's short reads are mapped to the reference, "there will be greater misalignment" if they "come from a population significantly distinct from the individual from whose the genome reference is produced." We could thus overlook risk variations in those locations that are not covered by the reference, he warns.

The field of genomics as a whole struggles with this lack of representation. Large data gaps can be seen in even the SNVs that have been more thoroughly researched. Hanchard and his colleagues, for instance, recently tested 426 people from 50 ethnolinguistic groups across Africa and discovered more than 3 million novel SNVs, the majority of which were from communities that had never been sampled previously. Hanchard claims, "We haven't even touched [SVs], but our early data shows it's going to be more of the same."

Medical results are directly impacted by such data discrepancies. For instance, there is a strong possibility that the variation is to blame if a person with a rare variant develops a rare illness. But frequently, we are unsure of whether a mutation is truly rare or just prevalent among communities that have not been thoroughly investigated. Doctors are unable to make a diagnosis in certain situations. "That happens a lot more for those with non-European background," claims Hanchard.

That will only become more crucial as we enter the era of precision medicine. Our understanding of variation among populations of European heritage is now so excellent that we can start using it for precision medicine, according to Kári Stefánsson, whose Reykjavik-based biotechnology business DeCode Genetics specialises in connecting the dots between genetic variants and disease. But he adds, "We do not have the same type of statistics" for other groups. Healthcare disparities will rise above and beyond what they are now as a result of this.

Some individuals think it makes sense to develop references to capture the variance within particular populations, such as ethnic groupings and country states, despite the fact that there are no genetic bases that meaningfully categorise people into various races. Denmark is one of the nations that presently has its own standard.


According to computational biologist Simon Rasmussen of Copenhagen University, who oversaw the study, "what we observe is that there's a lot of variance that [has only been found in] the Danish population." A reference based on Danes is ideally positioned to boost the Danish healthcare system, which is a compelling argument in favour of a local reference.

National genomes, according to some, concentrate too much on population characteristics rather than individual differences. National genomes might "keep the notion of race alive," according to medical anthropologist Emma Kowal of Deakin University in Victoria, Australia. Furthermore, according to Jenny Reardon, a sociologist of the bio sciences located at the University of California, Santa Cruz, defining genomes in terms of country does ultimately result in exclusion. In essence, "We are selecting who is Danish and who is not."



Rasmussen acknowledges that the 15% of Danes who are immigrants or their descendants would find less use for the reference. Even during the selection process for the reference, samples from individuals of mixed ancestry were eliminated. Rasmussen and his colleagues want to generate another reference because the first one was never used in the clinic due to permission issues. We want to use a different [selection] strategy, he replies in response. It has not yet been identified exactly how.

The national genomes are not the only option, though. The Human Pangenome Reference Consortium prefers to zoom out and overlay several genomes to establish a reference with diversity built into it—a pangenome—instead of zooming in on various populations. The consortium just released a preprint of the first version of such a reference.

The draught, which consists of 47 finely detailed genomes, is the first of 350 genomes that will be sequenced to encompass the most prevalent variation worldwide. According to consortium member Karen Miga of the University of California, Santa Cruz, "this is not a standard that has ever been accomplished before."

However, the project's goals go beyond just sequencing more varied data. Ting Wang, a colleague of Miga's from St. Louis, Missouri's Washington University School of Medicine, asserts that "we need to come up with a better data structure to encapsulate that information."


An example of such a data structure is a genomic graph. The genome graph displays diversity between genomes as deviations on an otherwise common path, in contrast to the current reference, which is merely a lengthy string of letters. This will make it possible for scientists and medical professionals to map brief readings to the route variant that best suits their sample.

How does one decide who gets to represent the globe is a logical question. The initial genomes met the requirements because to their high level of technical excellence, but the consortium will eventually need to choose fresh samples. The great majority of the genomes that we are adding are of African descent since Africa is the continent that gave rise to mankind, according to Miga.

Reardon believes that 350 persons would be a greater representation of the globe than one, but she also notes that "[the consortium] have made certain decisions concerning groupings." Who were they sampling? Who didn't they sample, then? One may argue that someone will be excluded if the reference only includes a subset.


That is not disputed by Miga. In order to do this, she explains, "[we are] actually attempting to capture common variance at a global level, so things you would see very regularly." In this situation, common variance is documented but unusual variation is not. She explains, "If you're searching for something incredibly unique, that is not our charge right now.

Individuals' genomes would ideally be sequenced without reference in an ideal world. This has long been touted as the ideal, trouble-free option, but few people think it will ever happen. It's not a simple task, and I don't see it being more difficult in ten years, says Hanchard.

Countries could be influenced by a reference that is more tailored to their people and is maintained and managed by them, as opposed to adopting a wide, global pangenome. Rasmussen asserts that "We don't really expect anybody other than the Danes to build a Danish reference genome" and expresses the expectation that the upcoming revision will be overseen by Denmark's state-run National Genome Centre, possibly as a component of the EU's Genome of Europe initiative.

Hanchard acknowledges the value of using regional or local references. He claims that not all of the variance will be present in the pangenome. He is a member of the H3Africa group, which is looking towards creating an Africa-specific genome graph in order to benefit Africa via genomics. He also believes that ultimately all of these allusions will come together.

He mentions recognising and comprehending the variance as it applies to himself or anybody else with Jamaican ancestry when asked about his ambitions for the future of genomics. As much as it is for any single group, he adds, "I would love to get to a place where everyone feels represented and that this is for them." The most crucial fact is that we are all members of one humanity.



Under the terms of a Creative Commons licence, this article has been taken from The Guardian. Go here to read the original article.

Post a Comment

0 Comments
* Please Don't Spam Here. All the Comments are Reviewed by Admin.