ABSTRACT
Background The earliest coronavirus disease-2019 (COVID-19) cases in Central Asia were announced in March 2020 by Kazakhstan. Despite the implementation of aggressive measures to curb infection spread, gaps remain in the understanding of the clinical and epidemiologic features of the regional pandemic.
Methods We did a retrospective, observational cohort study of patients with laboratory-confirmed COVID-19 in Kazakhstan between February and April 2020. We compared demographic, clinical, laboratory and radiological data of patients with different COVID-19 severities on admission. Univariable and multivariable logistic regression was used to assess factors associated with disease severity and death. Whole-genome SARS-CoV-2 analysis was performed in 53 patients without a recent history of international travel.
Findings Of the 1072 patients with laboratory-confirmed COVID-19 in March-April 2020, the median age was 36 years (IQR 24–50) and 484 (45%) were male. On admission, 683 (64%) participants had mild, 341 (32%) moderate, and 47 (4%) severe-to-critical COVID-19 manifestation; 20 deaths (1.87%) were reported at study exit. Multivariable regression indicated increasing odds of severe disease associated with older age (odds ratio 1.05, 95% CI 1.03-1.07, per year increase; p<0.001), the presence of comorbidities (2.13, 95% CI 1.07-4.23; p<0.031) and elevated white blood cell count (WBC, 1.14, 95% CI 1.01-1.28; p<0.032) on admission, while older age (1.09, 95% CI 1.06-1.12, per year increase; p<0.001) and male sex (5.97, 95% CI 1.95-18.32; p<0.002) were associated with increased odds of death. The Kazakhstan SARS-CoV-2 isolates grouped into seven distinct lineages O/B.4.1, S/A.2, S/B.1.1, G/B.1, GH/B.1.255, GH/B.1.3 and GR/B.1.1.10.
Interpretation Older age, comorbidities, increased WBC count, and male sex were risk factors for COVID-19 disease severity and mortality in Kazakhstan. The broad SARS-CoV-2 diversity suggests multiple importations and community-level amplification, likely predating the declaration of state emergency. Continuous epidemiologic and genomic surveillance may be critical for a better understanding of the regional COVID-19 dynamics.
INTRODUCTION
Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), the cause of the coronavirus disease-2019 (COVID-19), within months of emergence from Wuhan, China, rapidly spread exacting a devastating human toll across the globe 1. While the search for effective treatments continues and vaccines have commenced early implementation 1, it is imperative that up-to-date information be available from diverse populations on the disease epidemiology, clinical presentation, and population-specific characteristics influencing COVID-19 prevention, treatment, and vaccine strategies. This is especially critical in low- and middle-income countries (LMIC), where epidemiologic surveillance is often constrained due to resource shortages 2.
Kazakhstan was the first among the Central Asian LMICs to initiate COVID-19 screening in early 2020, with the first confirmed cases identified in mid-March, and a country-wide emergency state declared on 16 March 2020 3. Kazakhstan neighbours Russia, China, and the Central Asian states, and harbours an extensive ground and airway transit network, setting it in a vulnerable position for both COVID-19 importation and rapid community spread. Despite the initial swift public health response, Kazakhstan has encountered major barriers regarding case reporting and attempts have been made by both the government and public to increase transparency over COVID-19-related morbidity and mortality 3,4. Thus a recent study from Kazakhstan has described the clinical characteristics of COVID-19 in children 5, corroborating a relatively mild disease course for pediatric COVID-19.
However, little is still known about the clinical attributes and molecular epidemiology of COVID-19 across varied age and ethnic groups, information that is urgently needed to guide the public health authorities and clinicians in the wake of the on-going pandemic. Here, we examined data from hospitalized patients with laboratory-confirmed COVID-19 during the first months of the pandemic to explore demographic, clinical and laboratory features and factors associated with COVID-19 disease severity and death in Kazakhstan. We also analyzed whole-genome SARS-CoV-2 data to characterize the regional community-level virus diversity.
METHODS
Study design and participants
Medical records were obtained for individuals who presented with COVID-19-like symptoms or had a suspected exposure to SARS-CoV-2 between 20 February and 30 April 2020, as reported to the Republican Centre for Health Development by hospitals in 14 regions and 3 major cities (Fig 1A). During this period, all subjects with suspected or confirmed COVID-19 infection were hospitalized in specialized “provisional” clinics. The retrospective cohort study was approved by the Research Ethics Board of Semey Medical University as an anonymized epidemiological study, for which the requirement for informed consent was waived due to the pandemic state and urgent need to collect and analyze data. Virological samples used in genomic studies were anonymized and all genomic study procedures were approved by the Ethics Committee of the National Centre for Biotechnology.
Data collection
Epidemiological, demographic, clinical, laboratory, treatment, and outcome data were extracted from de-identified electronic medical records using a standardised data collection form. Radiological data were only available in the form of a radiologist’s final diagnosis; a detailed description of imaging scans was unavailable. All data were entered into a computerized database and cross-checked by two physicians (MG and RI), and subsequently by a third researcher (SY).
Laboratory procedures
Laboratory confirmation of SARS-CoV-2 infection was done using real-time quantitative - PCR (qPCR) on nasopharyngeal swabs by the regional National Centre of Expertise (NCE) laboratories using the Beijing Genomics Institute (BGI) kit (Shenzhen, China) targeting the Orf1ab locus A. Laboratory examinations included complete blood counts, blood chemical analyses, coagulation testing, liver and renal function assessment, and measurements of electrolytes, C-reactive protein, creatinine, D-dimer and lactate.
Virus genome sequencing and bioinformatics
Nasopharyngeal swabs were collected from 53 randomly selected symptomatic patients, who had a positive SARS-CoV-2 PCR test and were hospitalized in the capital city, Nur-Sultan between 22 March and 9 May (Fig 1B). Travel history was available for 49 patients, none of whom had a history of recent international travel. Full description of the sequencing methods involving Sanger and Next Generation Illumina Sequencing is provided in appendix pp. 2-4.
Phylogenetic analyses
The global SARS-CoV2 phylogeny, consisting of 5022 genomes representative of the total repository of 66,940 sequences, and associated metadata were downloaded from Nextstrain6 on 8 July 2020; the tree was reconstructed using the Nextstrain/ncov pipeline. To visualize the phylogenetic relationship of the Kazakhstan isolates in the context of possible origins of importation, we created a tree using 165 sequences from the Global Initiative on Sharing All Influenza Data (GISAID) 7 satisfying one or all of the following criteria: i) proximity to the Kazakhstan sequences, as determined by calculating the pairwise distance between all samples in the Global GISAID Phylogeny using snp-dists 8 and selecting the global strains closest to the Kazakhstan isolates (n=21); ii) representing non-redundant SARS-CoV-2 strains from regions, where international travel was reported from in the current cohort (see Fig 3) (n=63); and iii) randomly selected SARS-CoV-2 full-length sequences from each of the clades observed in Kazakhstan (appendix p. 5) per month between 31 December 2019 and 9 May 2020 (n=81). Sequences were aligned to the Wuhan reference using MAFFT 9, the alignment checked for discrepancies and ends trimmed manually to match the reference, followed by reconstruction of a maximum likelihood phylogeny in IQTREE 10 using the GTR + F + I model and ultrafast bootstrap option with 1,000 replicates. The phylogenetic trees were visualized using ggtree 11, and sequences classified using the GISAID and dynamic nomenclature systems 12.
Definitions
A patient was assigned a laboratory-confirmed COVID-19 diagnosis if their medical record contained at least one positive SARS-CoV-2 PCR test result. Fever was defined as axillary temperature of at least 37□3°C. The degree of COVID-19 severity was defined following the interim WHO guidelines 13. To avoid inconsistencies arising from varied terminology in electronic records, patients described as having “asymptomatic”, “presymptomatic” and “mild” COVID-19 severity were combined into one “mild COVID-19 disease” category. To facilitate comparisons with the literature, the term “non-severe” COVID-19 was used for both “mild” and “moderate” disease, while the term “severe” COVID-19 was reserved for the pooled “severe” and “critical” groups. Recent international travel history was defined as travel outside of Kazakhstan within two weeks prior to hospitalization or symptom onset, whichever was first.
Statistical analysis
Variables were excluded prior to analysis if data were available for less than 40% of patients in any comparison group. We used the two-sided Mann-Whitney U, χ2, or Fisher’s exact tests to compare differences between groups, as appropriate. For uni- and multivariable logistic regression analyses, patients with mild and moderate disease severity were pooled into one “non-severe” category and compared with the “severe” (severe-critical) patient group. Risk factors associated with disease severity and death and their odds ratios (OR) were first analyzed by univariable logistic regression. In the multivariable logistic regression analyses, to avoid model overfitting due to the limited size of endpoint events, four variables were chosen for the analysis of disease severity (n=47 for severe-critical patients) and two variables were ultimately included in the non-survivor analysis (n=20 for non-survivors). Variables for multivariable analyses were selected based on significance in the univariable model (p<0.05), low collinearity, sample size constraints for each variable, and prior knowledge from the literature. Statistical analyses were performed using IBM SPSS V.23 and R V.3.6.3.
Role of the funding source
The funder of the study had no role in study design, data collection, data analysis, data interpretation, or writing of the report. All authors had full access to all the data in the study and the lead authors (SY, MG, RI) had final responsibility for the decision to submit manuscript for publication.
RESULTS
Medical record data were obtained for a total of 1960 subjects with COVID-19-like clinical symptoms or suspected exposure to SARS-CoV-2 via a close contact or travel. After excluding 96 patients, for whom a SARS-CoV-2 PCR result was unavailable, there were 1864 patients with a known PCR result (Fig 1A). Our analysis focused on the 1072 patients, who had a confirmed SARS-CoV-2 PCR+ diagnosis, representing 32% of all (n=3402) patients with laboratory confirmed COVID-19 (Fig 2A), who had been hospitalized in Kazakhstan as of April 30, 2020 14.
The largest number of PCR-confirmed patients (207, 19%) were admitted in the capital city and affiliated municipality (Nur-Sultan, Akmola region), followed by Almaty city and region (127, 12%) (Fig 2B). 344 (32%) and 728 (68%) patients were hospitalized in March and April, respectively, and all patients had been discharged or died by 5 May 2020. 439 (41%) of all PCR+ cases were identified through COVID-19 screening of inbound travelers and through tracing activities, while 633 (59%) patients were admitted based on the presence of COVID-19-like symptomatology. Professional occupation data were available for 350 patients, of whom 91 (26%) were identified as health care workers. Recent international travel was reported by 170 (16%) patients; most reported travel was from Russia (58%) and Europe (30%), via airplane (59%) and ground transportation (41%) (Fig 3). The comparisons of patient demographic and clinical characteristics on admission grouped by disease severity and death are shown in Tables 1 and 2, respectively.
Of all 1072 patients, 63·8 % had mild, 31·8% moderate, 3·83% severe 0·56% critical severity of disease on admission. A total of 20 deaths were reported: 1 (0·1%) in a patient with mild symptoms, 4 (1·2%) in patients with moderate disease, and 15 (31%) among patients with severe-critical disease (Table 1). The median age of the cohort was 36 years and 8·8 and 7·6% of the patients were younger than 15 and older than 65 years, respectively. Patients with moderate and severe-critical disease were older than those with mild disease by a median of 6 and 26 years, respectively (Table 1). Compared to survivors, non-survivors were older by a median of 30 years, with most exceeding 50 years of age, and were more likely to be male (Table 2).
Most patients (80%) were Kazakh; patients with severe-critical disease and non-survivors were less likely to be Kazakh, compared to non-severe patients (Table 1) and survivors (Table 2), respectively. The proportion of patients with comorbidities increased with disease severity, and 80% (16/20) of the non-survivors had a comorbidity. Presentation of most clinical symptoms and signs differed across disease severity categories (Table 1).
Compared to survivors, non-survivors were more likely to have fever, respiratory abnormalities, low blood pressure, and myalgia/fatigue on admission (Table 2). Disease severity was associated with altered markers of coagulation (prothrombin time, fibrinogen), liver (albumin, alanine aminotransferase (ALT), aspartate aminotransferase (AST), bilirubin) and kidney (blood urea nitrogen (BUN), creatinine) function, and changes in blood glucose, C-reactive protein and minerals (potassium, calcium). Severe-critical patients were also more likely to have elevated neutrophil-to-lymphocyte ratios (33% of the patients) compared to mild and moderate severity patients (16 and 14%, respectively). Compared to survivors, non-survivors were more likely to have leukocytosis (7·5 vs 22% of the patients, respectively), lower platelet counts and altered fibrinogen, albumin, AST, bilirubin, glucose, BUN, creatinine, and sodium levels. D-dimer and lactate measurements were excluded from the analysis due to the small proportion of patients with available data. Chest X-ray and/or computed tomography (CT) data were available for 418 (39%) patients, of whom 57% had abnormalities defined as “pneumonia”(31%) or “bronchitis”(26%) by the radiologists in charge. Pneumonia was most frequently diagnosed in severe-critical patients (91%, Table 1) and non-survivors (80%, Table 2) compared to patients with non-severe disease and survivors.
Antibiotics and anticoagulant or antiplatelet therapy were more likely administered to patients with severe-critical disease, while significantly fewer patients with severe-critical disease received antiviral medications compared to patients with non-severe disease (Table 1). Compared with survivors, non-survivors were more likely to receive ribavirin and lopinavir-ritonavir and anticoagulant/antiplatelet therapy (Table 2). Oxygen therapy was administered to a total of 64 (6%) patients, chiefly those with severe-critical disease (65%) and all non-survivors (100%). Mechanical ventilation was performed on 27 (3%) patients, of whom 40% had severe-critical disease and 90% were non-survivors (Tables 1 and 2).
In the univariable analysis of COVID-19 severity predictors, older age, non-Kazakh ethnicity, comorbidities, elevated white blood cells (WBC), high neutrophil-to-lymphocyte ratio (NLR), lower haemoglobin, lower albumin and elevated creatinine were all associated with severe disease. We included 1072 patients (1024 non-severe and 47 severe-to-critical patients) in the multivariable analysis, with two continuous and two categorical variables (age, ethnicity, comorbidities, and white blood cells (WBC). Older age (OR 1·05 [95% CI 1·03-1·07], p<0.001), comorbidities (OR 2·13 [95% CI 1·07-4·23], p<0·031) and elevated WBC (OR 1·14 [95% CI 1·01-1·28], p<0·032) were associated with increased odds of severe disease (Table 3).
In the univariable analysis of COVID-19 mortality predictors, older age, male sex, non-Kazakh ethnicity, comorbidities, elevated WBC, decreased platelet counts, lower albumin and elevated creatinine were associated with death. 1072 patients (1052 survivors and 20 non-survivors) were included in the multivariable analysis and a backward elimination strategy was used to identify variables describing the best model. Older age (OR 1·09 [95% CI 1·06-1·12], p<0·001) and male sex (OR 5·97 [95% CI 1·95-18·32], p<0·002) were associated with increased odds of death (Table 4).
Genomic data were generated for the viral samples acquired from all 53 patients (median age 21 years (IQR 21-34), 66% male, 94% Kazakh, see Fig 1B and appendix p. 5). Compared to the Wuhan-1 reference, there were a maximum of 12 single-nucleotide polymorphisms (SNPs) per virus genome (median 9 SNPs, IQR 8-10, see appendix p. 6), corroborating little variation observed among the SARS-CoV-2 strains 15,16. The Kazakhstan (Kaz) strains grouped into 7 distinct global lineages: O/B.4.1 (n=27), S/A.2 and S/B.1.1 (19), GH/B.1.255 and GH/B.1.3 (n=5), G/B.1 (n=1) and GR/B.1.1.10 (n=1) (Fig 4A, appendix p.6). Kaz_O/B4.1 isolates clustered with predominantly Middle Eastern sequences that also included European isolates (Fig 4B), Kaz_S/A2-B.1.1 viruses were closely related to the strains from Spain, Britain, and Russia, although this clade appears to have predominantly Asian origins (Fig 4B). Kaz_GH/B.1.255-B.1.3 viruses were nested with Mexican and Argentinian viruses deriving ancestral lineages from both the Middle East and Europe. Kaz_G/B.1 and Kaz_GR/B.1.1.10 viruses cluster closely with North-Western/Western European and Southern European viruses, respectively (Fig 4B).
DISCUSSION
Here we detailed the clinical features of 1072 patients with laboratory confirmed COVID-19 and described the genetic diversity of circulating SARS-CoV-2 in Kazakhstan. Consistent with data from other cohorts1,17–22, older age, comorbidities, and elevated WBC on admission were associated with higher odds of severe disease, while higher odds of death were associated with older age and male sex. Additional factors observed in patients with severe disease included non-Kazakh ethnicity, high NLR, lower haemoglobin, lower albumin and elevated creatinine. Other factors commonly observed in non-survivors included non-Kazakh ethnicity, comorbidities, elevated WBC, decreased platelet counts, lower albumin and elevated creatinine. Our genomic findings point at several independent importations of SARS-CoV-2, followed by community-level amplification early in the pandemic.
A highlight of the current cohort is a lower ratio of patients with severe-critical to non-severe (moderate and mild) COVID-19 compared to other cohorts, despite an early implementation in Kazakhstan of the WHO disease classification guidelines13. Thus, the proportion of severe-critical cases in our study was 4%, which is approximately 5-fold lower than that reported elsewhere 17,21,23. At the same time, the overall mortality rate in our cohort (1.9%) is consistent with the estimates of crude confirmed case fatality risk among COVID-19 patients presenting a wide spectrum of disease manifestations 24, but is 10-fold lower than the rates typically described for hospitalized patients 25,26-probably because in Kazakhstan all laboratory-confirmed COVID-19 patients, regardless of disease manifestation on admission, were admitted to hospitals. As expected, in our cohort mortality correlated with disease severity; the risk of death was low in patients with mild disease, similar to that reported typically (∼0.1%) 17, but it escalated dramatically to a 4-fold higher risk in patients with severe-critical disease compared to severe COVID-19 elsewhere (e.g. 8% in 17). The high mortality associated with severe-critical COVID-19 in this cohort could be due to a combination of mis-categorization of patients manifesting severe COVID-19 symptoms into non-severe disease categories, and subsequent “saturation” of the severe-critical disease category by patients with unfavorable prognosis, and a shortage of healthcare resources in a rapidly escalating pandemic 2,3.
Consistent with other studies, including those from the neighbouring Uzbekistan and Russia 1,17–20,24,27, older age was a key contributor to COVID-19 severity and mortality, and the non-survivor group consisted predominantly of older men. The cohort’s median age (36 years) was higher than the Kazakhstan median (31 years)28 and identical to that of COVID-19 patients in Uzbekistan 27. The cohort sex ratio and ethnic make-up were consistent with the recent census data 28,29, whereby most Kazakhstan residents are Kazakh (68·5%), followed by peoples of Russian (18·9%) and of other ancestries (12·6%), principally Uzbeks, Uyghurs, Ukrainians, Tatars and Germans. In univariable analyses, non-Kazakh ethnicity was associated with both COVID-19 severity and death, although this effect was abrogated in multivariable models, suggesting that non-Kazakh ethnicity is linked to other risk factors, consistent with augmented risk of COVID-19 infection and more severe disease among ethnic minority groups in Western countries 30. A biological predisposition of non-Kazakh people to a more severe COVID-19 disease should be addressed by future investigations.
Overall, both our clinical and laboratory findings were in line with earlier research. Notwithstanding, there are differences that may have arisen due to the heterogeneity in data collection by facilities across the country. For example, unlike in other studies 17,22, thrombocytopenia in our cohort was uncommon, but, consistently, lower absolute platelet counts were associated with death. Further, differences in other blood count parameters were not significant among the patient sub-categories, but consistent changes associated with COVID-19 severity were observed in blood biochemistry, coagulation and liver and renal function tests, inflammatory markers and electrolytes 17,22,25. These observations emphasize the need for careful re-examination of the diagnostic test reference ranges utilized across the country in the context of COVID-19 patient management. Predictably, the prevalence of radiology-confirmed pneumonia increased with disease severity; based on radiological assessment, bronchitis was a common diagnosis in non-severe COVID-19, consistent with bronchial wall thickening in ∼20% of COVID-19 cases, indicating an effect of SARS-CoV-2 with or without a respiratory coinfection31.
We identified representatives of five of the eight global SARS-CoV-2 clades circulating in Kazakhstan early in the pandemic. Most isolates (46/53) belonged to clades O or S - descendants of the early lineages that originated in Asia and were globally prevalent prior to the appearance of the D614-G614 mutation that swept through Europe, reaching a ∼67% frequency among European sequences by mid-March 32. Since our viral isolates were sampled after the appearance of the G clade mutation in Europe, the high prevalence of O and S sub-types in Kazakhstan suggests an early importation, perhaps weeks prior to the declaration of the international travel ban, and subsequent amplification through community spread. Notably, the Kazakhstan S lineage isolates clustered with viruses of from Europe, such as Spain in the case of lineage S/A.2, while the Kazakhstan O clade isolates, clustering with Middle Eastern (particularly, Iranian) strains, all belonged to lineage B.4.1, which arose uniquely in Kazakhstan. These observations are consistent with the evidence that areas of southern Europe and the Middle East represented early points of SARS-CoV-2 introduction and spread 33. Additionally, the presence of three clade G lineages in Kazakhstan at diverse timepoints, between 25 March and 9 May, indicates multiple independent importations from Europe and the Americas.
Patient travel histories suggest that during the study period COVID-19 infections were imported mainly from the neighboring Russia, and from Europe. Only 1% of the patients reported travel from China, where case numbers had substantially decreased by March 2020 34. Remarkably, 41% of the international travelers used train or bus, and other means of ground transportation, which has implications for public health policies aimed to curtail COVID-19 spread on public transit.
Our study has several limitations. First, our cohort represents only a third of all COVID-19 cases confirmed by PCR in March-April in Kazakhstan. Second, the quality of our dataset is largely dependent on the quality of the medical record data, which were collected in emergency settings by clinicians and which could have increased data heterogeneity. Third, some data, such as laboratory findings for CRP, were only available for a subset of participants, while other data, such as D-dimer levels and information on clinical outcomes or complications were available for <40% of patients within the comparison groups and were excluded from the analyses. Fourth, death reporting may have been delayed or more deaths may have occurred after April 30, resulting in under-estimation of mortality. Fifth, laboratory confirmation of COVID-19 was done using a commercially available PCR supplied centrally to all testing sites, but we were unable to access data on the intra- and inter-laboratory performance of this test. Sixth, our genomic analysis was limited to a qualitative assessment and cannot fully recapitulate the country-wide SARS-CoV-2 diversity due to the small sample size and geographic constraints.
To conclude, we are hopeful that these clinical and virologic data on COVID-19 will help shape public health policies and therapeutic and vaccine interventions in Kazakhstan. Our findings could ultimately be applicable to other neighbouring countries that are similar in their ethnic and socio-cultural profiles and healthcare structures, meriting consideration by a broad spectrum of international public health authorities and policy makers.
Data Availability
The retrospective cohort study was approved by the Research Ethics Board of Semey Medical University as an anonymized epidemiological study, for which the requirement for informed consent was waived due to the pandemic state and urgent need to collect and analyze data. Virological samples used in genomic studies were anonymized and all genomic study procedures were approved by the Ethics Committee of the National Centre for Biotechnology (Nur-Sultan, Kazakhstan).
DECLARATIONS
Contributors
SY, MG and RI: conceived, designed, and implemented the study, drafted the manuscript. SY, MG, RI: analyzed the patient data. SY: performed statistical analyses and contributed to genomic analyses. SVG, DB: contributed to statistical analyses and performed genomic analyses. AS and the CGRG group performed sequencing experiments and did initial genomic data analysis. KSM provided laboratory advice, and clinical guidance and contributed to data analysis. the SCERG group extracted data from electronic medical records. RI, YZ: facilitated data acquisition, and supervised the study. All authors contributed to data interpretation, critically reviewed the manuscript draft, and approved the final version for submission.
Declaration of interests
The authors declare that they have no competing interests.
Data sharing
The datasets from this study will be made available, wherever possible, when not otherwise restricted by ongoing collaborative research, on appropriate request to the corresponding author. SARS-CoV-2 genomic data have been submitted to GISAID under accession IDs# EPI_ISL_435045-435048; EPI_ISL_454497-454520; EPI_ISL_454571-454604.
Funding
Genomic studies were funded by the Ministry of Education and Science of the Republic of Kazakhstan (Grant# AP08052352).
Supplementary information
All supplementary information can be found in the Appendix.
Acknowledgements
We thank all the participants, public health, and clinical and laboratory staff, who have been involved in COVID-19 diagnostic testing, clinical care and case management in Kazakhstan. The authors acknowledge the contributions from other laboratories to GISAID (see appendix pp. 10-22).