SARS-CoV-2 UK Variant

Last update: Jan 2021

In autumn 2020, new lineages of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) have started spreading in the United Kingdom. Those lineages were dubbed 501Y Variant 1 and 501Y Variant 2. While the expansion of Variant 1 was limited, 501Y Variant 2 has successfully spread across the UK and to over 50 other countries (as of January 2021). Therefore, 501Y Variant 2 has attracted much attention from the scientific community and public alike and has since been functioning under different names, such as B.1.1.7, 20B/501Y.V1, and VOC-202012/01, or "UK variant."

Researchers Leung et al. used mathematical modeling to investigate the transmissibility of the two SARS-CoV-2 501Y lineages and thus assess the danger they could pose to public health. They estimated the 501Y Variant 1 and Variant 2 (B.1.1.7) to be respectively 10% and 75% more transmissible compared to the 501N (old) lineage.

Following is the review of "Early transmissibility assessment of the N501Y mutant strains of SARS-CoV-2 in the United Kingdom, October to November 2020" by Leung, K., Shum, M. H., Leung, G. M., Lam, T. T., & Wu, J. T. published in 2021 under Creative Commons Attribution 4.0 International License. The paper is available on the Euro surveillance website (offsite link).


SARS-CoV-2 genome comprises around 30,000-nucleotides long RNA sequence coding for different proteins that make up the virus. Figure 1 to the right illustrates the coronavirus genome, with the mutations present in the 501Y Variant 2 (B.1.1.7) marked by the colored lines. Red lines indicate amino-acid substitutions, while the white ones indicate deletions.

The spike protein sequence, showed in red, is crucial for the viruses' function. Spike proteins assemble into trimers (groups of three) to create spikes located on the virus's surface. Spikes contain the receptor-binding domains, which attach to the receptors (called ACE2) on the surface of human cells during infection. For that reason, any mutation of the spike protein resulting in the change of its structure could lead to a change in the virus's infectivity, i.e., how well it can attach to human cells.

Notably, SARS-CoV-2 501Y Variant 1 and Variant 2 are characterized by the mutation causing N501Y amino acid substitution in the receptor-binding domain of the spike protein. This means that the asparagine (N) at the 501st position within the spike protein has been changed to tyrosine (Y). This substitution is thought to be key to increasing the contagiousness of the two 501Y lineages.

Throughout this review, 501Y (Y at position 501) lineage will stand for the new virus variant(s), and 501N (N at position 501) lineage will stand for the older variant (before substitution).

Mutations in the 501Y Lineages

501Y Variant 1 appeared before Variant 2. It has spread in Wales, where it was detected in less than 2% of all sequenced samples. 501Y Variant 2 (B.1.1.7) was detected later, and its proportion in the studied samples has grown from 0.1% in October to 49.5% in November 2020 (data from GISAID as of December 2020).

The rapid spread of the 501Y Variant 2 has been observed throughout England in November, hinting at higher transmissibility of this lineage compared to the 501N (no substitution) lineage. As shown in Table 1, 501Y Variant 2 boasts more genetic changes than Variant 1. The most notable mutation is the aforementioned N501Y in the spike protein. Moreover, P681H mutation or the deletion of the 69th and 70th amino acids (Δ69/Δ70) of the spike protein could also influence the viral functions.

Previous studies of the SARS-CoV-2 receptor-binding domain suggest that 501Y might increase the binding to the human angiotensin-converting enzyme 2 (ACE2), which is the functional receptor on the surface of human cells through which SARS-CoV-2 enters the host. The shape of the 501Y spike protein might also facilitate viral entry and infection.

Leung et al. decided to investigate whether the above mutations in the 501Y lineages influence the transmissibility of the coronavirus.

Table listing the mutations

Table 1. Mutations present in SARS-CoV-2 501Y Variant 1 and Variant 2

SARS-CoV-2 Phylogeny

Figure 2 shows the phylogeny of SARS-CoV-2 as of December 2020.

To create this maximum likelihood tree, the scientists used 7,003 viral genome sequences of SARS-CoV-2 from the GISAID database, which contains submissions from laboratories worldwide. The tree was built using the FastTree version 2.1, which uses nucleotide evolution and amino acid evolution models to approximate the origin of and relationship between respective strains based on their sequence similarity.

The bar at the bottom shows the length of a branch corresponding to 0.0002 substitutions per site. So the longer the branch of a particular strain, the more substitutions it harbors.

Relationship between different lineages shown in the form of phylogenic trees

Figure 2. Photogenic trees of the SARS-CoV-2 lineages

501Y Variant 1 in purple and Variant 2 in red are shown originating from the 20B clade. You can see that the 501Y Variant 2 (B.1.1.7 or 20B/501Y.V1) lineage, which harbors more genomic changes (listed in green), has longer branches than Variant 1. The asterisk next to the lineages indicates an over 98% chance that the genomes within the respective 501Y lineages align, as calculated using the Shimodaira-Hasegawa test used to assess the statistical significance of a given topology (shape of a tree). In other words, the two trees (representing the lineages) are statistically significant.

Apart from the UK variants, we currently know of two other lineages that contain the 501Y mutation. Those lineages, shown in orange, have been present for several months in 2020 within South Africa and Australia. Notably, though, they do not contain the Δ69/Δ70 deletion.

Transmissibility of SARS-CoV-2 501Y Variant 1 and Variant 2

Leung et al. calculated the transmissibility of the 501Y Variant 1 and Variant 2 compared to 501N, working under the assumption that it might differ due to N501Y substitution and other genomic changes harbored by 501Y lineages. For simplicity, during calculations, the authors abbreviated N501 to N, 501Y Variant 1 to Y1, and 501Y Variant 2 to Y2.

They defined the two lineages' comparative transmissibility as the ratio of their basic reproductive number to the basic reproductive number of N lineage. Basic reproductive number, denoted as R0, stands for the number of secondary infection cases arising from a single infection in a susceptible population (no immunity). R0 depends on the standard duration of an infection, the number of people an infected person has come in contact with, et cetera. For example, R0 equal to 10 means that one infected patient is expected to infect another 10 people.

Using R0, Leung et al. defined the comparative transmissibility of Y1 as σY1=R0Y1/R0N, and the comparative transmissibility of Y2 as σY2=R0Y2/R0N. Thus, the comparative transmissibility shows how many times more people will be infected by a single person with the 501Y lineage infection compared to a person with the 501N lineage.

To calculate the transmissibility of SARS-CoV-2 501Y lineages, Leung et al. used the prediction model they developed in 2016 when assessing the transmissibility of the antiviral-resistant influenza strains. The calculation was based on the value of the viral generation time. Generation time is the time that passes until a newly infected person infects another person. The time series of confirmed COVID-19 deaths in the UK was also used for the estimation.

The team inferred the comparative transmissibility working under six assumptions, some of which are: (1) the N, Y1, and Y2 strains circulated locally at the same time during the investigation period, (2) the effectiveness of non-pharmaceutical interventions was identical for all strains, and (3) the likelihood of getting chosen for the viral sequencing was the same for all strains' infections.

Given those assumptions, the next generation matrix (NGM; used to calculate R0) of infections by strains Y1 and Y2 were 𝜎𝑌1 and 𝜎𝑌2 times that of the N. The proportion of strain j among all new COVID-19 infections which occur at time 𝑡 can be denoted as ρj(t). It is estimated by the following formula:

Formula for estimating ρj(t)

where gj is the generation time distribution for strain j, and i(t) is the total incidence rate. In the calculations, gj was assumed to identical for all strains, with 5.4 days mean.

Now, let Zdj be the number of j strain sequences sampled on day 𝑑, and 𝑖̃(𝑡) be a reliable proxy of the incidence rate 𝑖(𝑡) thus denoting 𝜌𝑗(𝑡) as its approximation 𝜌̃𝑗(𝑡). The comparative transmissibility σY1 and σY2 can be inferred from the following likelihood function using the Markov Chain Monte Carlo prediction model.

Likelihood function for estimating the omparative transmissibility σY1 and σY2

Using this estimation method, the researchers found the σY1 (comparative transmissibility of 501Y Variant 1) to be 1.10. The 95% credible interval (Crl) was 1.06–1.13, meaning that there is a 95% chance that the estimate of σY1 lies within the 1.06–1.13 range. The approximation for σY2 (comparative transmissibility of 501Y Variant 2) was 1.75 (95% CrI: 1.70–1.80).

Thus, we can say that the R0, meaning the (average) number of secondary infections resulting from a single infection in a perfectly susceptible population, is 10% (95% CrI: 6–13%) higher for 501Y Variant 1, and 75% (95% CrI: 70–80%) higher for 501Y Variant 2 (B.1.1.7) when compared to the 501N lineage.

In other words, the researchers estimated that in a population where nobody is immune, a person infected with the new 501Y Variant 2 (B.1.1.7) could infect around 75% more people than a person infected with the 501N (old) lineage.

Researchers have also used data from Wales only to estimate the comparative transmissibility in that region, obtaining the σY1 value of 1.14 (95% CrI: 1.11–1.19). σY2 could not be estimated due to insufficient 502Y Variant 2 data from Wales.

Observed and Fitted SARS-CoV-2 Lineage Proportions

After approximating the transmissibilities of the new 501Y lineages, researchers compared their predicted model of coronavirus spread with the actual breakdown of SARS-CoV-2 501Y lineages detected in UK patients over time. Figure 3 shows the proportion of the three strains in autumn 2020 across three locations within the UK, the average for the whole region shown at the topmost graphs (Fig.3.A). On the left-side graphs, you can see that the number of collected sequences varied greatly depending on the day. To make it easier to analyze, this daily data has been used to calculate weekly average strain proportions illustrated on the right-side graphs.

Figure 4 shows the observed 501Y strain proportions as red and yellow data points. The model fit line, with the shaded credible interval regions, is based on the researchers' approximations of the comparative transmissibility. You can see that the data points follow the researchers' mathematical model fit, but many fall outside of its credible interval region.

SARS-CoV-2 lineages proportions by day and by week

Figure 3. Observed proportions of SARS-CoV-2 lineages from the infected UK patients

Figure 4. Observed and mathematically modeled SARS-CoV-2 501Y lineage proportions

Investigating the Potential Impact of Generation Time

Leung et al. explored if higher transmissibility of the 501Y lineages compared to 501N lineages might have been due to the shorter generation time of the virus rather than its higher R0. By conducting a sensitivity analysis, this time assuming the identical R0 between the three strains (in previous calculations, it is the generation time that was assumed identical), they estimated the 501Y Variant 2 mean generation time to be 44% shorter than that of 501N. However, the 501Y Variant 1 estimates could not be inferred by this method. Hence, the researchers returned to their original theory, claiming that the higher transmissibility of 501Y variants resulted from higher R0 values.


Leung et al. estimated the basic reproductive number (secondary infection no. arising from a single infection) of the SARS-CoV-2 501Y Variant 2 (B.1.1.7 or 20B/501Y.V1, or VOC-202012/01) to be 1.75 times higher than that of the 501N lineage. This would signify that 501Y Variant 2 is 75% more transmissible compared to 501N. Correspondingly, Variant 2 has become the dominant strain in England by the end of December 2020. This urgently calls for measures curbing the spread of the new variant within the UK and across the world. Indeed, the UK government has recently implemented new restrictions, and many countries, including Japan, stopped receiving visitors from the UK to control the spread of SARS-CoV-2 501Y Variant 2.

Restrictions notwithstanding, to date (December 2020), Variant 2 has been detected in countries such as Italy, Denmark, Spain, Japan, Singapore, Hong Kong, and the US, to name a few.

As shown by the example of 501Y Variant 1 in Wales, which only had a slightly increased transmissibility compared to the 501N lineage and was not widespread, not all lineages containing the 501Y mutation spread fast. However, autumn saw South Africa detect a 501Y lineage of SARS-CoV-2 without the Δ69/Δ70 deletion (unlike like Variant 2 in this study), which has been spreading successfully since October 2020. The SARS-CoV-2 phylogeny analysis shows that the South African variant is of a different origin than 501Y Variant 2 and contains different mutations. The collection of more sequence data will allow the researchers to assess the transmissibility of this distant variant in the future.

Limitations & Further Research

Leung et al. identified several limitations of their study. GISAID, from where the SARS-CoV-2 sequences were obtained, is a public database, susceptible to the selection bias of the genomic sequences that individual researchers choose to submit.

Since the proportion of 501Y Variant 2 sequences varied greatly depending on location and sampling time after 16th November, researchers had to reduce their investigation period to a short interval from 22nd September to 16th November 2020.

One of the assumptions that the comparative transmissibility estimation was based on was that the three SARS-CoV-2 strains co-circulated locally within the UK. However, as indicated by the phylogenetic analysis, 501Y Variant 1 and Variant 2 were geographically separated between Wales and England. This would not affect the estimated R0 values if the R0 of the 501N variant that the calculation is based on does not change.

Another assumption made in the calculations was that the age-specific susceptibility to SARS-CoV-2 was identical for 501N, 501Y Variant 1, and 501Y Variant 2. However, due to insufficient data, the authors could not confirm whether this assumption holds in reality. As explored in the introduction, 501Y mutation could increase binding to human ACE2 receptors on the cell's surface. If that was the case, authors theorize that children might be more susceptible to 501Y Variant 2.

Yet another assumption was that recovering from SARS-CoV-2 infection with a single strain made one immune to reinfection with any of the other strains. However, given the numerous mutations harbored by 501Y Variant 2, such as the Δ69/Δ70 deletion, immunoescape during reinfection could be an option, which would render this assumption false.

Lastly, the authors suggest that further studies investigating the influence of mobility and population mixing on higher transmissibility results of 501Y Variant 2 are needed. More immunogenomic studies investigating the possibility of the 501Y variants causing reinfection or escaping vaccinations due to many mutations are also desired.



[1] Leung, K., Shum, M. H., Leung, G. M., Lam, T. T., & Wu, J. T. (2021). Early transmissibility assessment of the N501Y mutant strains of SARS-CoV-2 in the United Kingdom, October to November 2020. Euro surveillance : bulletin Europeen sur les maladies transmissibles = European communicable disease bulletin, 26(1), 2002106.

[2] Jonathan Corum and Carl Zimmer. 2021. Inside the B.1.1.7 Coronavirus Variant. [Accessed 21 January 2021].

[3] Xiong, X., Qu, K., Ciazynska, K.A. et al. A thermostable, closed SARS-CoV-2 spike protein trimer. Nat Struct Mol Biol 27, 934–941 (2020).

[4] Kai, H., Kai, M. Interactions of coronaviruses with ACE2, angiotensin II, and RAS inhibitors—lessons from available evidence and insights into COVID-19. Hypertens Res 43, 648–654 (2020).

[5] Price, M. N., Dehal, P. S., & Arkin, A. P. (2009). FastTree: computing large minimum evolution trees with profiles instead of a distance matrix. Molecular biology and evolution, 26(7), 1641–1650.

[6] Planet P. J. (2006). Tree disagreement: measuring and testing incongruence in phylogenies. Journal of biomedical informatics, 39(1), 86–102.

[7] Dietz K. (1993). The estimation of the basic reproduction number for infectious diseases. Statistical methods in medical research, 2(1), 23–41.

[8] Deng, Y., You, C., Liu, Y., Qin, J., & Zhou, X. H. (2020). Estimation of incubation period and generation time based on observed length-biased epidemic cohort with censoring for COVID-19 outbreak in China. Biometrics, 10.1111/biom.13325. Advance online publication.

[9] Diekmann, O., Heesterbeek, J. A., & Roberts, M. G. (2010). The construction of next-generation matrices for compartmental epidemic models. Journal of the Royal Society, Interface, 7(47), 873–885.

[10] Hespanhol, L., Vallio, C. S., Costa, L. M., & Saragiotto, B. T. (2019). Understanding and interpreting confidence and credible intervals around effect estimates. Brazilian journal of physical therapy, 23(4), 290–301.


Figure 1. Jonathan Corum and Carl Zimmer. 2021. Inside the B.1.1.7 Coronavirus Variant. [Accessed 21 January 2021].

Figures 2-4 and Table 1. Leung, K., Shum, M. H., Leung, G. M., Lam, T. T., & Wu, J. T. (2021). Early transmissibility assessment of the N501Y mutant strains of SARS-CoV-2 in the United Kingdom, October to November 2020. Euro surveillance : bulletin Europeen sur les maladies transmissibles = European communicable disease bulletin, 26(1), 2002106. under Creative Commons Attribution 4.0 International License; graphics were not modified from the original