Abstract
COVID-19 caused by SARS-CoV-2 was first identified in Japan on January 15th, 2020, soon after the pandemic originated in Wuhan, China. Subsequently, Japan experienced three distinct waves of the outbreak in the span of a year and has been attributed to new exogenous strains and evolving existing strains. Japan engaged very early on in tracking different COVID-19 strains and have sequenced approximately 5% of all confirmed cases. While Japan has enforced stringent airport surveillance on cross-border travelers and returnees, some carriers appear to have advanced through the quarantine stations undetected. In this study 30493 genomes sampled in Japan were analyzed to understand the strains, heterogeneity and temporal evolution of different SARS-CoV-2 strains. We identified 12 discrete strains with a substantial number of cases with most strains possessing the spike (S) D614G and nucleocapsid (N) 203_204delinsKR mutations. 155 distinct strains have been introduced into Japan and 39 of them were introduced after strict quarantine policy was implemented. In particular, the B.1.1.7 strain, that emerged in the United Kingdom (UK) in September 2020, has been circulating in Japan since late 2020 after eluding cross-border quarantine stations. Similarly, the B.1.351 strain dubbed the South African variant, P.1 Brazilian strain and R.1 strain with the spike E484K mutation have been detected in Japan. At least 14 exogenous B.1.1.7 sub-strains have been independently introduced in Japan as of late March 2021, and these strains carry mutations that give selective advantage including N501Y, H69_V70del, and E484K that confer increased transmissibility, reduced efficacy to vaccines and possible increased virulence. Furthermore, various strains, which harbor multiple variants in the PCR primers and the probe developed by National Institute of Infectious Disease (NIID), are emerging. It is imperative that the quarantine policy be revised, cross-border surveillance reinforced, and new public health measures implemented to mitigate further transmission of this deadly disease and to identify strains that may engender resistance to vaccines.
Introduction
SARS-CoV-2, the etiological agent of COVID-19 was first identified in Wuhan, China in late 2019 before rapid worldwide transmission in the first quarter of 2020. In just a year, the number of confirmed cases exceeded 148 million globally with 3.1 million deaths as of April 27th, 20211 The actual number of infections are likely much higher by accounting for asymptomatic cases and mild disease that do not get tested and under reporting due to social stigma and discrimination associated with COVID-19 infections2. The novel coronavirus quickly spread to neighboring Japan, with a confirmed case of an individual on Jan 15, 2020 with travel history to Wuhan3. Soon after the identification of the first case, a major initial transmission catastrophe was averted when the government decreed the Diamond Princess cruise ship with infected patients to be anchored off the coast of Japan and mandated a 14-day quarantine for all passengers4. However, the number of confirmed infections has surpassed 572,000 with over 10 thousand reported fatalities in Japan1.
The COVID-19 pandemic has exerted an unprecedented stress on global health systems and has created a ripple effect touching every rubric of human life. Particularly the impact on healthcare (including mental health) and the underlying social, political, psychological and economic disruptions have had profound ramifications. In Japan, the dichotomy between the public health safety and societal and economic dynamics forced the postponement of the much-awaited summer Olympics and Paralympics games in Tokyo. Nonetheless, Japan implemented a stringent surveillance process at airports and seaports at the very early stages of the outbreak to monitor and quarantine travelers and repatriates with COVID-19 infection. In short, the rigorous surveillance process required a negative COVID-19 test prior to boarding, saliva antigen testing at the cross-border port-of-entry and, if positive, polymerase chain reaction (PCR) confirmation and sequencing at the National Institute of Infectious Diseases. While the surveillance process has been largely successful with 2392 patients detected and intercepted at quarantine stations by end of March 20215, there appear to be some patients harboring exogenous strains with different haplotypes of the virus who were undetected at the port-of-entry. Japan has been vested and engaged from the beginning of the pandemic to monitor genomic changes in SARS-CoV-2 and has sequenced remarkable 30493 genomes which are approximately 5% of all confirmed cases. Notwithstanding the rigor of the public health measures, three discrete waves of the disease have been observed in Japan and it’s plausible that different viral strains may have contributed to each spike.
Mutations are inevitable as viruses evolve as a mechanism to cope with selective pressure and confer selective advantage. COVID-19 is the first pandemic to occur after inexpensive sequencing technologies became widely available. SARS-CoV-2 accumulates mutations at the rate of 1.0 x 10-3 (per site/genome/year), that corresponds to 2.5 mutations in a month6. However, as number of the infections increases, the number of variants increase proportionally. SARS-CoV-2 has accumulated and established multiple mutations within a year from the first published report of D614G in early April 20207. New strains with spike N501Y mutation such as B.1.1.7 from England, B.1.351 from South Africa, P.1 from Brazil and R.1 have recently emerged8. Previously, N501Y has been functionally characterized and was reported to cause higher binding with ACE29 and another study reported that B.1.1.7 carrying H69_V70del in addition to N501Y possess a higher infectivity rate of 75%10. Furthermore, B.1.351, P.1 and R.1 strains contain spike E484K and appear to confer reduced efficacy to the currently available vaccines11,12.
In this work we have evaluated the publicly available SARS-CoV-2 genomes in Japan to elucidate different viral strains that were exogenously introduced, to understand community transmission patterns and to delineate founder strains that further evolve within a community.
Methods
Between 1 February 2020 and 5 April 2021, genomes were downloaded from the publicly available viral sequencing repositories, Global Initiative on Sharing All Influenza Data (GISAID)13, National Center for Biotechnology Information (NCBI), and NGDC Genome Warehouse, and the National Microbiology Data Center (NMDC); of the 939410 genomes downloaded, 30493 were from specimens collected in Japan. Low quality genomes with gaps and ambiguous bases over 50 base pairs in length (excluding at the start and the end of the genomes) were discarded. After completing quality control, variant analysis was performed on 638543 including 29679 Japanese genomes with the method described previously6. In brief, genomes were first aligned to the reference genome NC_045512 using The European Molecular Biology Open Software Suite (EMBOSS) needle14 with open gap penalty of 100 to filter out spurious frameshifts. Next, differences with the reference genome were extracted and annotated using gene definitions of SARS-CoV-2 as different variant types including missense, synonymous, non-coding and indels. All the obtained variants including global cases are stored in Supplemental Table S1. Subsequently, hierarchical clustering was performed on Japanese domestic cases to organize strains with similar haplotypes to construct a variant graph (Figure 1B).
Haplotype defining major strains were extracted from the variant graph in Figure 1B. Numbers of confirmed cases and deaths in Japan obtained from Our World in Data15 were shown in Figure 2A. Monthly occurrences of each strain defined by the haplotype and its active period with evolutionary relationships are illustrated in Figure 2B.
A candidate parent of a particular strain was identified by the maximum variant approach using the haplotype of the query strain16. A parental strain should have a haplotype which is a subset of the query strain. Among ancestors obtained in the previous step, the closest ancestor is the one with maximal number of matches between haplotypes. The resulting parent and child relationships for all Japanese domestic cases are obtained in Supplemental Table S2.
Data on monthly COVID-19 positive cases at airport quarantine centers in Japan and number of passengers though immigration were obtained from a report released by the Japanese Government5,17 (Figure 4A). For each strain, we sought to identify a plausible parent including exogenous strains, which share more common variants than any domestic strains. Number of exogenous strains for each month is represented in Figure 4B. Among the exogenous strains, strains whose all probably ancestors were found after April 15th were considered to have advanced through quarantine stations undetected (Table 2).
Strains which NIID_N qPCR probe fails to detect were identified by counting mutations in primers and probe sites. If any given strain has mutations involving multiple bases in the probe or in the primers, the strain was considered undetectable (Table 3).
Results
30493 genomes obtained between January 2020 and April 2021 were used for this analysis, accounting for about 5% of the confirmed cases in Japan (Figure 1A). We identified that frequently observed variants are shown in variant graph (Figure 1B). Spike protein D614G followed by ORF1ab P4715L (RdRp P323L), 5’-UTR 241C>T, nucleocapsid protein (N) 203_204delinsKR and ORF1ab L16L (313C>T) are the most common variants among strains in Japan, representing over 90% of all variants. L16L is unique to Japanese D614G_KR and its derivative strains while other three variants are common to all global D614G clade derived strains. Among the top non-synonymous variants, ORF1ab S1361P, P3371S, A4815V, S1261P, P5968L, R7085I, N P151L, M234I and spike M153T, are highly specific to Japanese strains. P3371S, which is located in 3CLPro, is reported to reduce the protease activity18.
By and large, strains circulating in Japan belong to one of the 12 strains as shown in Table 1; number of samples belong to the strain, defining haplotype, and other common variants are displayed with associated PANGO lineages19. The most frequently observed strain is D614G_KR_M234I, which harbors P5968L, R7085I, and N: M234I and most likely evolved from D614G_KR in Japan in March 2020. Figure 2A depicts number of positive cases and deaths during the period along with important events. As shown in Figure 2B, the Wuhan strain (wild type strain) and F6302L strain were introduced into Japan in January 2020 followed by L84S, and L3606F strains in February. In March, D614G related strains D614G, D614G_KR, D614G_Q57H were introduced into Japan. In April, a surge in D614G_KR occurred along with the emergence of D614G_KR_P3771S. In May, D614G_KR_P3771S_M153T emerged from the parental strain D614G_KR_P3771S and started gaining momentum during the summer. Although D614G_KR_P3371S_M153T was first observed in May, transmission route does not originate from a single founder(Figure 3). Therefore, the actual emergence appears to be at an earlier time. Significant decrease in numbers were observed in June marking the end of the first wave. In July, Japan experienced the second wave of the pandemic with three predominant strains detected: D614G_KR_P3771S, its offspring D614G_KR_P3771S_M153T, and D614G_KR_M234I. Although infection rates plateaued in October, Japan entered the third wave in November. While there is a paucity of available genomes in November, D614G_KR_M234I appears to be the dominant strain in the third wave. R1 and B117 strains were introduced in late 2020 and started gaining momentum in the fourth wave replacing existing strains.
2392 positive cases were detected and intercepted at airport quarantine stations by the end of March 2021 since its inception in March 2020; of which 871 samples were sequenced(Figure 4A). In total, we have identified that 155 distinct strains have been introduced into Japan. However, the number of potentially undetected cases that evaded the surveillance scrutiny during the same time period was 39 after eliminating ambiguous cases as shown in Table 2. Most of the exogenous strains appears to have been introduced into Japan before strict airport surveillance policy was implemented as shown in Figure 4B. Among the 39 cases, 14 independent B.1.1.7 strains advanced through the quarantine checkpoints while 76 individuals carrying the strains were intercepted and quarantined. Besides spike N501Y variant, it is quite concerning that spike E484K mutation has been reported to reduce vaccine efficacy12. Furthermore, strains harboring spike E484K, namely, B.1.351, the South African strain, P.1, the Brazilian strain, and recently identified R.1 strain have been introduced into Japan. The R.1 strains were first observed in Niigata Prefecture in November (GISAID: EPI_ISL_895012); however, this strain has an extra mutation ORF1ab T265I compared with EPI_ISL_1123466, which is identified in late December as seen in Table 2. We were not able to identify the progenitor R.1 strain in any of the high-quality genomes in the cohort. However, we discovered a probable ancestor in a genome from United Kingdom (UK) (GISAID: EPI_ISL_675164) collected in November 14 which had long undetermined bases beyond the quality threshold of this study. To be exact, EPI_ISL_675164 has 4456C>T (ORF1ab A1397A) synonymous mutation, which Japanese R.1 strains are missing. UK has high volume of SARS CoV-2 sequencing, and no other R.1 strain was uncovered in November in UK. Furthermore, a sample from a Japanese quarantine station obtained from a traveler from Liberia in December (GISAID: EPI_ISL_736897) belongs to R.1. Combining these facts, it is likely that the R.1 strain may have originated elsewhere other than UK, where genomic surveillance is not frequently conducted.
Since qPCR tests have been performed to detect SARS-CoV-2, it is concerning that emerging strains are not detectable with qPCR tests. In Japan, NIID_N qPCR test kit was developed to detect SARS-CoV-220. Although it is well designed and captures most of the strains reported today, there exists strains which harbor multiple mutations in primer and probe sites as shown in Table 3.
Discussion
In this work we have identified multiple independently introduced discrete strains of SARS-CoV-2 in Japan which may have contributed to community spread. We also observed further evolution of exogenous founder strains after their introduction into Japan. It is quite intriguing that D614G_KR is dominant despite the fact that comparable numbers of D614G and D614G_Q57H strains were introduced into Japan. For unknown reasons, strains other than D614G_KR have not been able to establish and propagate in Japan21.
The airport quarantine stations have intercepted 2392 individuals carrying SARS-CoV-2 strains. For landlocked countries with contiguous borders, such quarantine strategy is not easy to implement. While cross-border surveillance is easier to implement in an island such as Japan, the study identified 39 cases, which had advanced through airport quarantine checkpoints and introduced to the Japanese population (Table 2). Since only 5% of confirmed cases have been sequenced, the actual number of individuals and SARS-CoV-2 strains passing undetected at quarantine stations is likely to be larger. Airport quarantine checkpoints have been utilizing antigen testing on saliva samples to improve lead times and throughput since July 27th 2020. The number of positive cases detected at borders decreased in August in spite of the slight increase in number of people who entered Japan as seen in Figure 4A. It is more difficult to detect pre-symptomatic or asymptomatic patients with an antigen test on a saliva sample. It is concerning that the antigen test has low positive agreement rate of 55.2% with respect to qPCR resulting in high false negative rates for nasopharyngeal samples 22. In fact, four distinct B.1.1.7 strains have been identified and the number of people infected with B.1.1.7 has alarmingly augmented. Mandatory qPCR should be introduced to identify and intercept emerging strains such as B.1.1.7, B.1.351, P.1. and R.1. and any novel future strains. Additionally, global genomic data should be vigilantly monitored to identify new variants that potentially may be hard to discover with a traditional antigen or an existing qPCR test. As the virus accumulates more mutations, it is not possible to capture all the strains with a single probe test; therefore, it is advisable to utilize multiple probe qPCR.
Besides test accuracy at quarantine stations, compliance among travelers is another possibility of cases being undetected. Public health offices are urging cross-border travelers to be monitored for 14 days but there is no guarantee that they have followed the proper self-quarantine protocols. In fact, approximately 20% of cross border travelers have been lost for follow-up 23. Furthermore, some travelers enter Japan from exempt countries viewed as low risk; similarly, airline employees had been exempted from testing. During 14-day self-quarantine period, one can transmit to his/her family members or cohabitants, who can further transmit to people outside the household. Thus, it would be ideal to conduct interim and exit testing to further reduce infected passengers from circulating exogenous strains24.
There are several limitations of this study. While the genomes from airport quarantine checkpoints were released in a timely manner; the genomes from Japanese domestic samples lagged behind by months. Furthermore, more precise information on collection dates is needed; most of the Japanese samples only carry year and month but not the date. Moreover, specimen collection locations are not available on most instances. Therefore, it is challenging to evaluate spatial propagation of the virus and to assess the effect of a ‘go-to’ travel campaign to encourage business and leisure travel in an attempt to rescue the ailing travel industry. Therefore, while the study could not reveal unambiguous evidence, the campaign may have contributed to the resurgence of the virus across Japan and exacerbated the situation25.
It is imperative that the government mitigate the spread of new strains carrying the spike N501Y and E484K by reinforcing quarantine policies, further ramping up genome sequencing and large scale antigen testing for asymptomatic population26. In particular, wide spread of E484K variant strains may undermine vaccination efforts and discourage Japanese citizens from getting the vaccines. More aggressive genome sampling would facilitate understanding transmission dynamics of the virus including origins, routes, and rates, which is critical to its containment. Additional genomic analysis with clinical data as well as employing animal models will provide insights on infectivity, susceptibility, treatment resistance, vaccine evasion as well as the inferred pathogenicity and virulence27,28.
The COVID-19 pandemic is not over yet. Most citizens are not expected to be immunized by summer 2021, a time when Japan is hoping to host the postponed Olympic and Paralympics games. The onus is on the government of Japan to successfully executing these events while ensuring the public health safety of its citizens. The COVID-19 pandemic won’t be the last pandemic to affect mankind; therefore, reflections on the lessons learned through this ordeal should be used to ameliorate the effects of the next pandemic.
Data Availability
All genome data used in the study is listed in supplemental table and available at GISAID and NCBI. COVID-19 case and death number in japan is available at Our World in Data.
Funding
The authors received no specific funding for this work.
Authors Contributions
T.K. conception of work. T.K. and R.T. acquisition and analysis of data. ALL interpretation of data. T.K. and R.T. drafted the work. D.W. and J.S. substantial revision. ALL reviewed the manuscript.
Completing interests
The authors declare no competing interests.
Supplemental Materials
Supplemental Table S1: Genomes Used in Analysis
Supplemental Table S2: Parent and child relationships among Japanese SARS-CoV-2 strains
Acknowledgements
We gratefully acknowledge the authors, and the originating and submitting laboratories providing sequences from GISAID’s EpiFlu™ Database, NCBI and the National Microbiology Data Center (NMDC) is based. The list of genomes is provided in Supplemental Table S1.
Footnotes
・Analysis update with newly available genomes in japan ・Reclassify discrete strains (added B117, R1, and D614G_KR_M234I_A90T strains. removed D614G_KR_V70I and E972K strains)