A group of independent international researchers has released its full analysis of newly uncovered metagenomic data collected by the Chinese Centers for Disease Control and Prevention in January and February of 2020. The data closely links SARS-CoV-2 to the genetic tracks of wild animals, particularly raccoon dogs sold at the Huanan Wholesale Seafood Market in Wuhan, China, the early epicenter of the COVID-19 pandemic, the group's analysis says.
The full analysis provides additional compelling evidence that the pandemic coronavirus made its leap to humans through a natural spillover, with a wild animal at the market acting as an intermediate host between the virus's natural reservoir in horseshoe bats and humans. It was authored by 19 scientists, led by Michael Worobey, an evolutionary biologist at the University of Arizona; Kristian Andersen, a virologist at the Scripps Research Institute in California; and Florence Débarre, a theoretician who specializes in evolutionary biology at France's national research agency, CNRS.
Prior to the release of the full analysis late Monday, information on the findings was only made public through media reports and statements from the World Health Organization, which was briefed on the analysis last week. But, the raw metagenomic data behind the analysis is still not publicly available. It was briefly posted on a public genetic database called the Global Initiative on Sharing Avian Influenza Data (GISAID) as recently as earlier this month, and the international researchers were able to download it during that window of availability. But, administrators for the database quickly removed the data after its discovery, saying the removal was at the request of the submitter, a researcher at China CDC.
Dark data
Researchers at China CDC have since indicated to the international researchers and the WHO that they intend to share the data, which supports a scientific manuscript currently undergoing peer review at a scientific journal. But the international researchers note that there is no timeline for the release of the data or stated plans if their manuscript is not accepted for publication.
Throughout the pandemic, efforts to investigate SARS-CoV-2's origins have been thwarted by stonewalling from China, which holds to an unsupported hypothesis that the virus originated outside its borders.
In introductory remarks to the newly released analysis, the researchers argue that, while they're honoring GISAID's terms of use, it is long past due for this data to be available to the public and scientific community. They called on both GISAID and colleagues in China to make it available.
"The GISAID terms of use do not preclude the public discussion of data as long as the data generators are acknowledged and best efforts have been made to collaborate with the contributors," they wrote in defense of releasing their full analysis. "CCDC [China CDC] has thus far declined to collaborate on this. We respect our CCDC colleagues’ right to be first to publish a manuscript on their own data and do not plan to submit a paper that would compete with their manuscript currently undergoing review." Still, they argued that by GISAID allowing China CDC to remove the genetic data from public view amid peer review, the database is essentially granting China CDC an embargo, which is a departure from GISAID's stated mission to rapidly overcome such hurdles for sharing virological data.
"Samples from the Huanan Market were collected in January and February 2020 and, given their importance to understanding the origin of the pandemic, we feel this is an unreasonable amount of time to have passed," the researchers wrote.
Data context
They also highlighted that the metagenomic data briefly posted on GISAID is not the full extent of genetic data China CDC has, which it has not shared with the international community. Metagenomic data from other market sampling remains to be seen publicly, they note.
The data the group has been able to get its hands on so far, however, paints a nearly complete picture of how the devastating pandemic began. The metagenomic data came from around 50 data files, which are listed in the analysis' appendix B, but are currently not publicly available. The data is metagenomic sequences from some of the swabs and wastewater sampling that China CDC collected around the Huanan market after it was shut down on January 1, 2020. These swabs were previously reported; In February 2022, China CDC researchers released a preprint study on 1,380 environmental and animal samples taken from the market.
The preprint study was led by George Gao, then-director of China CDC. It indicated that environmental swabs were positive for SARS-CoV-2 and contained human genetic material but that the swabs of animals in the market—including mostly rabbits, stray cats, snakes, and hedgehogs—were all negative. Given those findings, Gao and colleagues concluded that humans—not animals—brought the virus into the large market, which then acted as an amplifier of infection due to the large number of people who visited the market daily. China previously suggested that the virus was introduced to the country on imported frozen foods sold at the market.
Still, that preprint data indicated that SARS-CoV-2-positive samples were predominantly in the southwestern zone of the market, where live mammals were sold. Other investigations have since found the same, including the Joint WHO-China study and an analysis published last July in Science by Worobey and colleagues. In Figure 4 of the Science article, Worobey and his co-authors showed that the southwest corner of the market had the highest density of SARS-CoV-2-positive environmental samples and was also where illegally sold wild mammals were held. That includes raccoon dogs, one of which was photographed in 2014 by one of the study authors, Edward Holmes, a biologist at the University of Sydney. The study also found that some of the earliest human cases of COVID-19 clustered in the western portion of the market, around where the live animals were housed.
Genetic tracks
The newly released analysis includes additional genomic data from around 50 of the SARS-CoV-2-positive swabs taken by China CDC from stalls in that southwestern corner, as well as elsewhere in the market. In contrast to the preprint by Gao and colleagues, the metagenomic data indicates that the swabs in the southwest zone were not only positive for SARS-CoV-2 and some human genetic material, they were also brimming with genetic material from wild animals, some of which are known to be susceptible to SARS-CoV-2 infection. These included raccoon dogs, Siberian weasels, Amur hedgehogs, hoary bamboo rats, Malayan porcupines, dogs, Himalayan marmots, and masked palm civets.
The cache of data included metagenomic information from six samples, taken from two stalls, that had high levels of raccoon dog genetic material. Raccoon dogs are known to be susceptible to SARS-CoV-2 infections and known to shed high levels of viral particles. In particular, one sample, Q61, taken from a cart, contained 1,252 genetic fragments with 100 percent identity to the raccoon dog genome but contained zero sequences that had such a perfect match to the human genome. When the researchers took a closer look at the genetic fragments to see what was encoded, they found a mix of genes that are continuously active as well as tissue-specific genes, such as ones involved in mucus production and smell receptors. These findings hint that the swabs were picking up raccoon dog nasal excretions, which may have been more likely in animals sick with a respiratory virus, like SARS-CoV-2.
Such close commingling of genetic material from wild animals and SARS-CoV-2 in an area of the market with the highest density of virus-positive samples, and around which many of the earliest COVID-19 cases were identified, makes a compelling argument that a natural spillover occurred and, specifically, occurred in this area of the market, the researchers argue.
In the concluding remarks of their analysis, Worobey, Anderson, Débarre, and colleagues quickly summarize how this data fits into other data regarding the origin of the pandemic. They highlight again that the Huanan market was the initial epicenter of the pandemic, with most of the early cases having a direct link to the market or occurring in the close surrounding area. While SARS-CoV-2 samples and cases were found throughout the market, the highest concentration of positive samples centered around stalls with wild animals, many of which are known to be susceptible to the virus. Further, many of the earliest human cases also surrounded this area in the western zone of the market.
They also note that a separate genetic study published last year in Science found that there were two genetic lineages of SARS-CoV-2 in the early days of the pandemic—lineage A and lineage B. The two lineages suggest that the virus spilled over into humans on two separate occasions, days to weeks apart from each other. Both lineages were found in the market. The study authors, led by Jonathan Pekar at the University of California, San Diego, concluded that, based on a series of modeling, it was most likely that SARS-CoV-2 lineage B jumped into humans at the Huanan market in mid-November 2019, and lineage A jumped to humans in the Huanan market in late November.
Next steps
The new data is unlikely to sway some staunch supporters of the competing hypothesis, which is that SARS-CoV-2 made its way into humans via a biosafety breach at a virology lab in Wuhan—the "lab leak" hypothesis. There is no direct evidence for this, and virologists, geneticists, and the US intelligence community largely agree that SARS-CoV-2 was not developed as a biological weapon nor was it genetically engineered. While lab leak proponents argue that the market was merely a superspreader site for the virus, the existence of two early SARS-CoV-2 genetic lineages, both linked to the market, strain this hypothesis. To be true, it would require two separate accidents to infect lab workers, who then happened to both spread the virus in this one specific wildlife market.
The authors of the new analysis argue that a more evidence-based scenario is that a group of wild, illegally sold wild mammals brought the virus into the market in late 2019, where they continually shed infectious virus, providing numerous opportunities for the virus to adapt and jump to humans over the course of weeks to possibly months. Genetic data indicates it spilled over twice, from the southwest corner of the market where live animals were sold before radiating out of the market. This is a similar scenario to the spillover of SARS-CoV-1, which caused the SARS outbreak of 2003. Studies suggest it spilled over from masked palm civets and potentially other wild animals—including a raccoon dog—at a wild animal market like the one in Huanan. And MERS-CoV, which causes Middle East respiratory syndrome (MERS), is known to spread to people via dromedary camels, an intermediate host for the virus.
With the new genetic data, Worobey, Anderson, Débarre, and colleagues say their analysis can help trace back genetically related wild animals that may have carried the virus into the market, though time is slipping away.
"Further studies on the origin of SARS-CoV-2 should include investigation of all the supply chains of the stalls identified here as selling wildlife where SARS-CoV-2 was detected, as well as population genetic studies of wildlife farms supplying the market and of wild populations in the vicinity of Wuhan and beyond," they write. "However, as the events in question occurred over three years ago, the window of opportunity for these investigations is closing."