Making sense of the Global Coronavirus Data: The role of testing rates in understanding the pandemic and our exit strategy

The Coronavirus disease 2019(COVID-19) outbreak has caused havoc across the world. Subsequently, research on COVID-19 has focused on number of cases and deaths and predicted projections have focused on these parameters. We propose that the number of tests performed is a very important denominator in understanding the COVID-19 data. We analysed the number of diagnostic tests performed in proportion to the number of cases and subsequently deaths across different countries and projected pandemic outcomes. We obtained real time COVID-19 data from the reference website Worldometer at 0900 BST on Saturday 4th April, 2020 and collated the information obtained on the top 50 countries with the highest number of COVID 19 cases. We analysed this data according to the number of tests performed as the main denominator. Country wise population level pandemic projections were extrapolated utilising three models - 1) inherent case per test and death per test rates at the time of obtaining the data (4/4/2020 0900 BST) for each country; 2) rates adjusted according to the countries who conducted at least 100000 tests and 3) rates adjusted according to South Korea. We showed that testing rates impact on the number of cases and deaths and ultimately on future projections for the pandemic across different countries. We found that countries with the highest testing rates per population have the lowest death rates and give us an early indication of an eventual COVID-19 mortality rate. It is only by continued testing on a large scale that will enable us to know if the increasing number of patients who are seriously unwell in hospitals across the world are the tip of the iceberg or not. Accordingly, obtaining this information through a rapid increase in testing globally is the only way which will enable us to exit the COVID-19 pandemic and reduce economic and social instability.


Introduction
The Coronavirus disease 2019(COVID-19) outbreak has caused havoc across the world after it was first reported in Wuhan, China 1,2 . Subsequently, research on COVID-19 has exploded to understand the new disease and its impact on mankind [3][4][5][6][7][8][9][10][11][12][13][14][15][16] . However, the number of baseless articles resulting in fake news articles has also gone up exponentially [17][18][19] . A number of models have been adapted by policymakers to predict the course of COVID-19 across the world 4,6,13,20 . The reason for such models is to ensure that healthcare systems can plan services to help them cope with the demands of this new disease which is resulting in serious cases leading to hospitalisation 3,8 . Core elements of the prediction models have been the number of cases and deaths reported and these studies extrapolated the numbers forward to the population over time 4,6,13,20 . Given the pandemic course of COVID-19, it has become common practice to compare its spread in different countries using case fatality rates 3,4,7,13 . However, such methods only tell us part of the story. Vast differences amongst countries in their testing policies for varied reasons including availability of testing equipment, infrastructure, resources and local governing policies affect case fatality rates. In addition, comparing case fatality rates between countries which are at different stages of the epidemic in their region would be erroneous as rates at the beginning and end would be lower compared to rates at the peak when healthcare services are stretched to their limits. Therefore, the search for a common yardstick or denominator is necessary to compare different countries so that the data can be extrapolated for global comparison. Over the past four weeks, as COVID-19 spread further around the world, testing rates have picked up in most countries. We propose that analysis of the number of diagnostic tests performed in proportion to the number of cases and subsequently deaths in the underlying populations of different countries is the best way to predict what might happen next. We analysed this from the ACALM Big Data research unit.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Methods
We obtained real time COVID-19 data from the reference website Worldometer at 0900 BST on Saturday 4 th April, 2020 and collated the information obtained on the top 50 countries with the highest number of COVID-19 cases 21 . From this source, we obtained many parameters including the number of country wise COVID-19 cases, deaths, tests performed, cases per million population, deaths per million population and tests per million population. China and Saudi Arabia were excluded due to lack of data on number of diagnostic tests performed, therefore numbers 51 and 52 were included in the compiled top 50 list.
We obtained case fatality rates by dividing the number of deaths by the number of cases represented as a percentage. Next, tests per positive case were calculated by dividing the number of tests by the number of cases. We then calculated the number of cases per test and number of deaths per test by dividing the number of cases and deaths respectively, by the number of tests represented as a percentage (a case per test rate and a death per test rate). Subsequently, we obtained the population of these countries (in millions) from the number of cases divided by the number of cases per million. We can obviously obtain more accurate country population statistics from other sources but to maintain our consistency of the data source and methodology (for all countries), we derived the information from this data only. We then analysed the above in three steps.
Firstly, we extrapolated the population level pandemic data for each country in terms of cases and number of deaths according to each country's case per test rate and death per test rate as calculated as a snapshot at the time of obtaining the data.
There are a number of limitations to the methodology used when taking a snapshot of these countries at a point in time, as done above, especially because each country is likely to be on a different part of the pandemic curve and extrapolating to the population level data is not likely to be . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
(which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.04.06.20054239 doi: medRxiv preprint accurate. Therefore, we further undertook a consistent adjustment according to countries which performed the most tests; in favour of larger countries with bigger populations we chose an arbitrary cut off of 100,000 tests per country. 15 countries had undertaken more than 100,000 tests and as all these countries showed differences in their cases/test and deaths/test we took the 15 country group as a whole to obtain an adjustment factor according to the cases/test and deaths/test. Using this we derived a case per test rate of 13.53% and a death per test rate of 0.77% for the 15 country group. Based on the above numbers, we extrapolated figures at the population level for all 50 countries to calculate the predicated number of cases and deaths.
We felt it necessary to undertake further analysis, the third analysis, to adjust the data to a country which is progressing towards the latter half of the pandemic curve -South Korea [22][23][24] . Ideally, undertaking this adjustment with data from China would be most appropriate but data for the number of diagnostic tests performed in China was not available. The adjustment factor for South Korea was a case per test rate of 2.23% and 0.04% death per test rate. Hence our country wise population level pandemic projections were based on 1) inherent case per test and death per test rates at the time of obtaining the data (4/4/2020 0900 BST) for each country 2) rates adjusted according to the countries who conducted at least 100000 tests and 3) rates adjusted according to South Korea. Our analyses are shown in the tables and figures. No additional analyses were performed.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Results
Full data obtained on 4/4/2020 are shown in Table 1 for the top 50 countries with highest number of COVID-19 cases in the world. Table 2 shows the countries according to number of tests performed per positive diagnosed COVID-19 cases. Table 3 shows population level pandemic projections for cases and deaths according to each individual country's case per test and death per test rate on 4/4/2020 0900 BST. Table 4 shows population level pandemic projections adjusted for the combined case per test and death per test rate of the 15 countries group that have performed at least 100000 tests. Table 5 shows the population level pandemic projections adjusted for the case per test and death per test rate of South Korea. Figure 1 show a scatter plot to show the relationship between the case fatality rate and the testing rate as a percentage of the total population of the country for the countries which have tested at least 1% of their total population. Italy was excluded from this scatter plot as it was an outlier with a case fatality rate of 12.25%.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not peer-reviewed)
The copyright holder for this preprint . https://doi.org/10.1101/2020.04.06.20054239 doi: medRxiv preprint Discussion COVID-19 statistics are complex and comparing different countries based on number of total cases, deaths and/or case fatality rate does not show the complete picture (Table 1). A common denominator is required to make senses of these numbers and we propose that this denominator is the number of diagnostic tests performed. In our analyses we showed the deaths and cases in relation to the number of tests performed and presented population level pandemic projections based on these. This is particularly relevant in the current environment where testing parameters vary across different countries leading to non-uniformity in projections. It is important to discuss each of our different analyses in turn, the rationale, drawbacks and what it means for different countries.
As table 2 shows, the number of tests per positive case is an important parameter because it is an indication of how widely the testing policy of the respective country has followed the advice from the World Health Organisation (WHO) 1  high as the raw data obtain from this source but for consistency in dealing with all the raw data in the same way, we analysed according to the data obtained from Worldometer. Of course similar bias could be inherent for the testing data for all countries but in our defence we have treated all the raw data obtained in the same way for consistency and have opened up the data for scrutiny.
Another important factor to consider here is the testing policy followed in these countries. Are the countries at the top of this table testing a cohort of people who have a low possibility of carrying this . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not peer-reviewed)
The copyright holder for this preprint . https://doi.org/10.1101/2020.04.06.20054239 doi: medRxiv preprint infection? If only those with symptoms are tested then individuals are more likely to test positive for COVID-19 leading to a low test per positive number. If these countries test the sickest of patients as you would expect in countries with the largest populations and limited testing kits to do, then high testing rates per positive case is even more remarkable as it may suggest lower virus rates compared to other countries but this cannot be concluded from this study. Furthermore, the reliability of local testing kits is an important factor as there are a number of reports of COVID-19 patients testing negative numerous times before a positive test 11 . South Korea can be considered as an exception to this because following an explosion of cases initially, they embarked on an extensive testing policy along with isolation policies combined with the utility of mobile tech and applications to inform the public about real time locations of positive cases. As such, it is widely accepted that South Korea are further along the pandemic curve and the rates of new cases and deaths have significantly reduced [22][23][24] .
Projections for the pandemic on an individual population level are very important for governments to plan and organise healthcare systems in response. COVID-19 presents a unique problem because there is no immunity for this in the community, nor a vaccination or targeted medical treatment.
Given that this is a highly contagious disease that spreads very quickly, if a large part of the population suffer from the disease in a short space of time, even if majority of cases are mild, a small minority of severe/critical cases will still lead to significant pressures on healthcare systems as now seen in Italy, Spain and the USA (particularly New York). Globally lockdowns have been instated to reduce the spread of infection, allow the healthcare systems to cope with the condition and "flatten the curve" of the pandemic. These were not enforced all at once and the projections in table 3 are based on the data available on 4/4/2020 and a snapshot depending on the actions, policies of individual countries. Much more complex models have been undertaken by different groups which included time as a variable 20 . However, we propose that testing rate is a very important parameter in projecting the outcomes of the pandemic. Therefore, whilst it seems far-fetched to suggest that Indonesia may end up with over 7 million deaths from less than 2000 cases reported so far, we have . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not peer-reviewed)
The copyright holder for this preprint . https://doi.org/10.1101/2020.04.06.20054239 doi: medRxiv preprint to note that only just over 7000 tests have been performed for a country of 283 million. There are a number of factors for low testing rates such as local policies, lack of resources and equipment and it is impossible to discuss them all; we point out that testing rates are extremely important in projecting the outcomes of the pandemic particularly in countries with large populations. In this context, if we look at countries such as the UK and India both of which have tested over 100000 tests, given the large populations and their case per test rate and death per test rate on 4/4/2020, both countries have projections for over 1000000 deaths.
As mentioned the position of any given country on the pandemic curve is important in determining population level projections and since we proposed that testing rates have an impact on projections we adjusted all projections to the combined case per test and death per test rates of all the countries that have performed over 100000 tests. This analyses is shown in table 4. We also felt that projections should be done on the case per test and death per test rate for South Korea given the countries position on the pandemic curve and are shown in table 4 [22][23][24] . Both these analyses are biased in terms of predications for total deaths for countries with larger populations. For example in spite of the case per test and deaths per test rate being low in India, as the population of the country is large, the projections are still over 10 millions deaths as per table 4 (adjustment according to countries which have performed more than 100,000 tests) and 500,000 deaths as per table 5 (South Korea adjustment).
It is also no coincidence that none of the top 10 countries in table 4 or table 5 have tested at least 1% of the total population. We looked at the countries that have tested at least 1% of their populations and looked at their cases fatality rate. We excluded the 10 th country on the list -Italy because of its high case fatality rate of 12.25%. All other countries had a case fatality rate of 3% or lower. We then correlated the case fatality rate with percentage of the population tested as shown in figure 1. This approach showed higher percentage of population tested in countries with lower populations who have tested a higher proportion of their total population, but not in all cases. Both . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not peer-reviewed)
The copyright holder for this preprint . https://doi.org/10.1101/2020.04.06.20054239 doi: medRxiv preprint Germany (population 83 million) and to a lesser extent Australia (population 25million) and have tested more than 1% of their population and showed low case fatality rates (1.4% Germany and 0.54% Australia). Provided that their testing criteria is reliable, these figures may serve as early indicators of the actual mortality rate for COVID-19 and these low figures are encouraging.
None of these methods used for projections are likely to hold true in reality. If we go back to the analyses for South Korea and its projections of approximately 20,000 deaths, there have been only 177 deaths in South Korea so far. It seems highly improbable that for a country where the number of cases and deaths have significantly tailed off would end up with 19,952 deaths. Furthermore, the herd immunity concept has been a strategy to contain disease spread not only for COVID-19 but across a number of pandemics such as Swine Flu 26 . Although it is widely debated as to what percentage of the population would need to be affected by the disease to confer herd immunity, a figure of 60% has been widely used [27][28][29] . Even if we adjust the South Korea figures (table 5) to 60%, we will probably still over estimate the number of deaths.
Where does all of this leave us and what is the point of all these statistics and analyses? Clearly from the example of South Korea we can contain COVID-19 and in spite of differences of the specific policies of lockdown between countries, social distancing and limiting spread are the broad themes to take forward. The analyses in this study highlight the importance of testing as the relevant denominator for which all the COVID-19 data should be related to. The testing policy is advocated strongly by the WHO in their COVID-19 statements 1 . The suggested early indication of a low mortality rate from our analyses, coupled with the fact that COVID-19 is a new disease affecting the globe in a short time, it is highly plausible that the serious cases and deaths we are seeing in the some countries may be the tip of the iceberg of a disease that has spread widely. If we look at influenza data there are millions of cases and up to half a million deaths worldwide every year due to flu and these tend to be seasonal in spite of vaccination programmes and herd immunity to some extent [27][28][29][30] . In the case of COVID-19 we might be experiencing the full whammy of a disease without . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not peer-reviewed)
The copyright holder for this preprint . https://doi.org/10.1101/2020.04.06.20054239 doi: medRxiv preprint immunity, globally all at once resulting in deaths. The magnitude of these deaths in perspective to other diseases such as Influenza may not be high 30 . Our analyses in this study do not prove this theory but the only thing that can is continued extensive and rapid testing across the globe. This may be the only exit strategy to prevent COVID-19 related economic and social breakdown.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not peer-reviewed)
The copyright holder for this preprint . https://doi.org/10.1101/2020.04.06.20054239 doi: medRxiv preprint . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.