## ABSTRACT

This work focuses on a time series analysis, forecast and prediction of COVID-19 fatality rates in the African American community. Decision makers and medical providers will find the work useful in improving cares to the demography. Our analysis of COVID-19 cases and deaths spans March 2020 to December 2020. COVID-19 forecasting cases and deaths models were built for the total population as well as Blacks in eight states with medium to large population of African Americans in the USA. Holt and Holt-Winters exponential smoothing forecast methodologies were used for the forecast modelling. The results show that there exists a strong evidence of a disproportionate impact of COVID-19 in the states considered. Furthermore, we designed, developed, and evaluated a fatality rate predictive model for a Black county. Five learning algorithms were trained and evaluated. Using 9 different criteria for performance comparison, the result of our experiments showed that decision tree model has a slight edge over other models for predicting fatality rates in a Black county.

## I. INTRODUCTION

In the United States, preliminary statistical data shows that the Black population were disproportionally affected by the pandemic [6]. Time series analysis points to the fact that the pandemic had a devastating, disruptive, and damaging impact on the Black community [7]. Since the world woke up to the emergence of this diseases in the late 2019 [8], fatalities and mortalities rates for the most part were on an upward trajectory among this community [9]. Counties and zip codes with large population of Blacks became synonymous with high COVID-19 fatalities. For example, Blacks in Prince Georges, Montgomery, and Baltimore counties are 32%, 20.1% and 30.3% respectively of the total population of the state of Maryland in the United. As of January 7, 2021, these three counties were first, second and third respectively in the number of coronavirus cases in the state. On the fatalities count, Montgomery county displaced Prince Georges county for the top spot. Prince Georges county came second while Baltimore county retained the third position [10].

Scientific evidence has connected the disproportionate impact of coronavirus cases in the Black community to the poor health care and social economic disadvantage of the community [11]. Before the outbreak of the COVID-19 pandemic, researchers found that the life expectancy of the Blacks are lower than other ethnic groups. Study shows that the average life expectancy of White and Black women are 81.0 years and 78.1 years respectively. Furthermore, average life expectancy of White male to the Black male are found to be 76.1 year and 71.5 years respectively [12]. The disproportionality in the life expectancy of Blacks when compared with the whites are not surprising. This is because several studies have shown that African Americans have been bearing the burden of most diseases in the US. For example, research on diabetics shows that Blacks have more than 60% chance of being diagnosed with the disease than the Whites [13]. They also have 42% chance of being a new victim of HIV [14]. Furthermore, heart attacks data shows that the population are 20% more susceptible than the White [15]. The same narrative has been found to be true for obesity [16] and asthma [17]. Several studies have also shown that African Americans are more likely to die prematurely from any diseases than Whites [18].

The prevalent poor health care system coupled with long standing pre-existing conditions have been found to have a drastic, significant and far-reaching impact on the mortality and fatality rates of African American COVID-19 patients [19]. Patients with diseases such as hypertension, diabetes, congestive heart failure, chronic kidney disease and cancer have been found to have a higher mortality and fatality rates when contracted COVID-19 [20]. Ferdinand and Nasser argued that the prevalent cardiovascular disease (CVD) among African Americans which directly links with a poor health care condition of the community is to be blamed for the disproportionality in the coronavirus cases in the African American and other minority communities [21]. Socio-economic factors have also been found to contribute to the disproportionality in the coronavirus cases in the African American community. Studies have shown that Blacks and Whites have a poverty rate of 22% and 9% respectively [22]. Furthermore, the median household income of the Whites is found to be 10 times that of the Blacks [23]. During the pandemic, only 20% of African American population could work from home as compared with the 30% of Whites. Furthermore, 34% and 14% of African Americans and whites are likely to use public transportations respectively [24]. Unhealthy diet [25] and population density [26] are other contributing factors.

Since the outbreak of the coronavirus pandemic, there have been different studies on COVID-19. However, there seems to be a gap in literature on a time series analysis and predictive modeling of fatality rates in the African American community. The vulnerability of the community to coronavirus pandemic has been a subject of different studies [27]. We believe that all stakeholders should have an answer to the question; is there COVID-19 healthcare disproportionality in the African American community? If the answer is yes, the next question is: how do we predict COVID-19 fatality rate in the African American community? Such an answer will help in improving the quality of cares to the affected population. Therefore, using scientific principle, we analyzed COVID 19 datasets from April 12, 2020 to December 25, 2020; to understand hidden patterns and discover knowledge on the disproportionate impact of the coronavirus pandemic and predict fatality rate in the disadvantaged demography [28].

The spread of the coronavirus was controlled in parts with measures such as border closing, travel restriction and airport screening. However, research shows that these measures only reduced the spread of the disease without a far-reaching effect on its impact [2]. To improve mitigation strategy, most governments encouraged handwashing [3], social distancing [4] and mask covering [5]. We argue that a disproportionate impact of a pandemic is a major setback to an effective universal mitigation strategy. In this paper, understanding COVID-19 health care disproportionate impact in the Black community is our goal. Achieving this goal, we will consider the following 3 objectives, 1) COVID-19 Time Series Analysis, 2) COVID-19 Health Care Disproportionality and African American Community, and 3) Predicting COVID-19 Fatality Rate in African American Community. The paper is organized as follows; COVID-19 times series analysis, COVID-19 healthcare disproportionality and Predicting COVID-19 Fatality Rate in African American Community are discussed in sections 2, 3, and 4 respectively. We conclude our study in section 5 with the implication of the study in section 6. We highlight the limitation of the study and acknowledge our funding source in sections 7 and 8.

## 2. COVID-19 TIME SERIES ANALYSIS

Dataset for this experiment was obtained from The COVID Racial Data Tracker. The repository is a collabrotion project between the COVID Tracking Project and Boston University Center for Antiracist Research [29]. COVID 19 datasets from April 12, 2020 to December 25, 2020 was extracted for the states of Floria (FL), Georgia (GA), Maryland (MD), Mississippi (MS), North Carolina (NC), Philadelphia (PA), South Carolina (SC) and Virgina (VA). We began our investigation by exploring the dataset to see underlying patterns, trends and seasonalities at different time lines.

We ploted area graphs to show the visual representation of the underlining patterns of the dataset. Graphical representation was shown for the total cases and deaths in each of the states. Cases and deaths of Blacks were also plotted for each state. Research has shown that graphical data visualization has a direct impact on the effectiveness of data analysis. Thus visual methodology of data exploration and representation has a unique way of making information noticeable, salient and memorable [30]. Most humans are visual learners. An area graph combines the attributes of line and bar charts. In our analysis, states are represented with shaded areas staked on the top of one another. This arrangement shows how the impact of the virus changes over time in each of the states. Our graph is broken down further into yearly quarters. At the end of each quarter, we show the state of the virus in each of the states. Figures 1, 2, 3 and 4 show the area graphs of total cases, Black cases, total death and Black deaths respectively.

As shown in figure 1, at the end of each quarter, many states have seen their share of the affliction of COVID-19 followed different trajectories. For example, at the end of the first quarter, FL was approximately one hundred and forty thousand total cases. However at the end of the second quarter, cases at FL has skyrocketed to over seven hundred thousand. This is a five time increment. As shown in the graph, the virus timeline was divided into 3 quarters. This is because the pandemic impacted the United States sometimes in March 2020 [31]. Although it had began in China in 2019 [32] and the World Health Organization declared it a pandemic in March 11, 2020 [33].

The trajectory of the graph shows that the first quarter of COVID-19 started at the begining of April. Therefore, April to June is COVID-19 first quarter. As shown in the graphs, at the first quarter, the virus had a little impact in all the states. The major impact of the virus became very abovious at the second quarter (July to September). The virus slowed down at the early second quarter but gained momentum towards the end of the quarter. The brief period when the virus slowed down might be the beginning of the summer period in the US. However, a further looks at the graph shows that the high temperature assumption of the summer period did not last. By August, the situation at FL was alarming, the virus seemed to have a found an abode in the Sunshine state.

As the total number of COVID-19 cases were rising in each of these states, fatalitity rate was on increase. Figure 3 shows that at the end of the first quarter, PA was hardest hit with coronavirus deaths. The state recorded approximatley 6 500 deaths. COVID-19 victims at NC was a liitle above 1 000, FL was at 3 400. However at end of the third quarter, FL took the lead with more than 14 000 deaths (this is around 300% increase). It seems PA been able to bend the curve; its death count increased to only 8 000 (this is around 23% increase). Except for GA which was close to FL in high death counts, other states closed the second quarters with approximately 3 000 COVID-19 victims. By December 16, FL, PA and GA lead the death tolls and recorded more than 20 thousands, 13 thousands and 10 thousands deaths respectively. Other states were at around 4 000 and 5 000.

## 3. COVID-19 HEALTH CARE DISPROPORTIONALITY (CHCD) AND AFRICAN AMERICAN COMMUNITY

The time series analysis shows the underlying patterns of the waves of COVID-19 in the states under investigation. Figures 1 to 4 suggest that there is a disproportionality in the percentage of Blacks who contracted COVID-19 to the percentage who died of it. For example, as of December 13, 2020, in FL, the total cases and Black cases were 1 155 335 and 146 128 respectively-this is a 12.65% of Black cases to the total cases. However, the total COVID-19 deaths and Black deaths were 20 490 and 3 461 respectively-which is a 16.89% of Black deaths to the total deaths. Also, in GA there were total cases and Black cases of 488 338 and 132 709 respectively-this is a 27.18% Black cases to the total cases. However, there were total deaths and Black deaths of 10 228 and 3 567 respectively-this is 34.87% Black deaths to the total deaths. In both FL and GA, the results suggest a disproportionality of 4.24% and 7.70% respectively. Table 1 was created for all the eight states in our study.

We define COVID-19 Health Care Disproportionality (CHCD) as the difference between the percentage of Black Cases to Total Cases and Black Deaths to Total Deaths. If the result of the former is more than the later, it suggests that there is a disproportionality in COVID-19 impacts.

Black/Total Death Ratio (BTDR) Black/Total Case Ratio (BTCR) COVID-19 Health Care Disproportionality (CHCD) Using Eq. (4), Table 1 shows the result of CHCD as of December 13, 2020. The table suggests that MD, NC and SC are doing worse than other states in the proportion of Blacks who survived COVID-19 when compared with the number of those who contracted it.

### Forecasting CHCD

Suppose there is no vaccination and other factors remain the same, our objective in this section is to demonstrate that CHCD will continue in the Black community. Achieving this objective, we will: 1) build models with the capability of forecasting COVID-19 total cases and number of Blacks who will likely contrast COVID 19, 2) build models to forecast COVID-19 total deaths and Black deaths, and 3) compute a COVID 19 Health Care Disproportionality Table.

We designed, developed, and evaluated 8 forecasting models. The proposed models will forecast total cases, total death, Black cases and Black deaths using Holts and Holt-Winter exponential smoothing. Our forecast will be to the end of 2021 first quarter. An exponential forecast is a univariate time series forecast methodology. Its uniqueness is on its exponential decaying average of the weights of past observations; most recent observations are apportioned more weights than old observations. This approach makes it more reliable and accurate in forecasting wider range of time series than the moving average.

We experimented to determine CHCD for the first quarter of 2021 in the states under investigation. As shown in figure 5, dataset went through a pre-processing stage to make it suitable for the experiment. At the design and development stage, forecasting models were designed and developed using Holts and Holts-Winters methodologies. We evaluated the performance of each models at the evaluation stage. Selection of best models was done at the select best models’ stage. Computation of Black/Total Death Ratio (BTDR) and Black/Total Case Ratio (BTCR) was done at the compute BTDR and BTCR respectively. Finally, a CHDR table was created at the CHDR table stage.

#### 3.1.1 Simple Exponential Smoothing

The simple exponential smoothening does not consider trend or seasonality in forecasting. This puts a limit to the effectiveness of its application. In the simplest form, the forecasted value *y*′_{t+h|T} is equal to the last observed value *y*_{T} for *h = 1, 2*, ….
Equation 4 can be simplified further to be weighted average of all past observations.
We can improve on equation 5 by including decaying weights to past observations. This is the intuition of exponential smoothening:
y_{1}, …., y_{t} are t observations. The decay rate is represented as parameter *α*; where 0 ≤ *α* ≤ 1. As *α* moves towards 1, the most past observations are given more weights; making the learning rate to become faster. On the other hand, a value close to 0 reduces the learning rate because more weights are given to the past observations.

#### 3.1.2 Holt (Double Exponential Smoothing)

Holt exponential forecast is an extension of the simple exponential forecast methodology. It includes the trend smoothing parameter *β* for the trend *b*_{t} in addition to the decay rate *α* for the smoothing factor at level *l*_{t}. This improves the effectiveness and accuracy of its forecasting capability.

Holt Forecasting equation can be defined as
The estimated forecast *y*′_{t+h|t} consist of the level *l*_{t} *and trend b*_{t} for *h = 1, 2*, …., with t observations.

The level *l*_{t} can be expressed as
The Trend *b*_{t} can also be expressed as

#### 3.3.4 Holts-Winter (Triple Exponential Smoothing)

There are two variations of Holt-Winters seasonal methods: the additive and multiplicative. Each variant consists of forecast *y*′_{t+h|t}, *level l*_{t}, *trend b*_{t} and *seasonal component s*_{t} equations. *l*_{t}, *b*_{t} and *s*_{t} are the level, trend and seasonal components respectively. The smoothing parameters α, β and γ respectively for the *l*_{t}, *b*_{t} and *s*_{t}.

For time *t with m* frequency of seasonality *the* Holt-Winters additive method is:
Level *l*_{t}, *trend b*_{t} and seasonal component *s*_{t} can be represented as equations 11, 12 and 13 respectively.
The Holt-Winter multiplicative variant can be represented as

### 3.4 Evaluation

For each of the states we forecasted the total number of people that may contract COVID-19 as well as the number who would be Blacks. Also, for each state, we forecasted the number of those who will die if they contract COVID-19. Furthermore, forecast was done for the Blacks who may die of COVID-19. Holt and Holt-Winters were used as our forecasting models. Performance evaluation was done with Mean Absolute Percentage Error (MAPE). The forecasting error *e*_{t} is given as the difference between the estimated value *y*′_{t} and the actual value *y*_{t},
*p*_{t} is the percentage ratio between the error of the model and the actual value.
The MAPE is the absolute mean of *p*_{t}
The MAPE has been effectively used in evaluating the accuracy of forecasting models. In predicting infant mortality rate, Purwanto et.al. compared the effectiveness of ARIMA, Neural Network and Linear Regression using MAPE [34].

Results of our forecast are shown graphically in figures 5 to 12. As shown the graphs are divided into 2020 and 2021. The forecast is for the first quarter of 2021. As proposed, we experimented with Holts and Holts-Winters exponential smoothing forecast methodologies. For each state in our study, we forecasted for total cases and death. We also forecasted for Black cases and deaths. We compared the performance of the two forecasting models in each of the state. Performance evaluation was based on MAPE. The effectiveness of COVID-19 forecast modeling using MAPE has been shown [35].

The performance comparison table is shown in table 2. The table shows that in most of the states, Holts-Winters exponential smoothing outperformed the Holts exponential smoothing. This suggests that seasonality is a factor in most of the states.

As shown in table 2, in most of the states, except for a few outliers, the Holt-Winter (Holt_W) models have lower MAPE than the Holt models. Notably in MD, the fact that Holt outperformed Holt-Winters in forecasting deaths suggests that seasonality might not factor into the forecast of deaths in this state. The MAPE of SC on both models are in double digits; suggesting that our models did not completely capture all the underlying time series patterns of the state. A better model is defined as the one with the lower MAPE. Therefore, model selection was based on the performance on the MAPE. Using this approach, we obtained table 3.

As shown in table 3, HW outperformed Holt in most of the cases and deaths. Table 4 shows the forecasted values from the selected models. For example, Holt Winter was selected as the better model for the total cases in FL because it has a lower MAPE as compared with Holt. The forecasted value at FL turned out to be 2, 399, 349. In SC, Holts Winter was also selected as the better model to forecast the total cases with a forecasted value of 509, 004.

As shown in table in table 4, SC is the only state with a negative CHCD result. This result is not surprising because we have found out in table 2 that its forecast may not be accurate. The MAPE for Holts and Holts-Winters models of the state are in the double digits.

## 4. PREDICTING COVID-19 FATALITY RATE IN AFRICAN AMERICAN COMMUNITY

Using time series analysis and forecasting modeling, we have shown that there is a strong evidence of disproportionality in the impact of the coronavirus pandemic in the African American community. This disproportionality in the impact suggests that a universal modeling of the pandemic in the US may be inadequate to predict fatality rate in the Black community. Therefore, the next stage of our investigation is to design, develop and evaluate a COVID-19 fatality predictive model with the capability of predicting fatalities in the Black community.

Dataset for this experiment was obtained from the John Hopkins COVID-19 repository [36], US Census Bureau [37] and US Center for Disease Control and Prevention (CDC) [38]. The data consisted of the fatality rates of the pandemic in each county of the US. Since the focus our work is on the Black community, we targeted counties that we considered as a true representation of the Black community. We assumed a Black county to be any county in the US that are at least 45% Black. Based on the number of ethnic groups in the US, we believe that this is a fair assumption. Figure 13 shows the geographical location of these counties on the US map. The experimental design flowchart is shown in figure 14.

### 4.1 Z Standardization

Dataset consisted of predictors with varying degree of magnitudes. Therefore, feature scaling was done to make all predictors be on the same scale. This ensures that all predictors are equally important. Min-max normalization is a very popular normalization approach; however, Z standardization is preferred because of its ability to handle outliers.

Mathematically,
Where z, *x, μ* and *σ* are the normalized values, predictor, mean and standard deviation, respectively. Dataset was then split into 75% and 25% for training and testing, respectively.

### 4.2 Learning Algorithm

Five different learning algorithms were trained and evaluated to predict fatality rate in the Black community, namely: Linear Regression, Decision Tree, K-Nearest Neighbor (KNN), and Support Vector Machine (SVM) model. We evaluated the performances of the models using Mean Square Error (MSE).

#### 4.2.1 Linear Regression

Linear Regression maps a linear relationship between a response variable *Y* and feature vectors **X**.

A simple linear regression can be represented as
To accommodate *p* number of predictors X, a simple linear regression is modelled to multiple linear regression
The regression coefficients are estimated as
Then
Where *y*′ is the estimated response variable.

Therefore, the Residual Sum of Squares error is Substituting equation 25 into equation 26 For this study, ridge regression was used. The ridge regression is a variant of the multiple linear regression with a shrinkage penalty parameter λ. Additional constrain was added to RSS As shown in equation 2, the ridge regression seeks to improve the fitness of the multiple linear regression with the additional penalty λ. The penalty term has no effect when λ =0, which produces the multiple linear regression. However, the impact of the penalty grows as λ -> ∞ and the ridge coefficient estimates tends to zero. For this model, the value of λ was set to 1.0E-8.

Feature selection strategy was used to determine all necessary and sufficient predictors for the ridge regression. Our feature selection strategy comprises of the M5 Prime. The M5 prime feature selection model is a variant of decision tree that builds multiple linear regressions at each node. Predictors with a low standardized value are eliminated to improve the Akaike Information Criterion (AIC). Where k and L are the predictors and likelihood of the models respectively.

#### 4.2.2 Decision Tree

A Decision Tree is a tree where every node addresses a test of a predictor and the leaf node gives classification or regression. In this study, we built a regression decision tree. The test model is characterized by beginning at the root, testing attribute values at every node, and arranging down to the suitable branch till it arrives at the leaf node which gives classification.

Suppose *X*_{1}, *X*_{2}, …, *X*_{P} are *P* predictors in a feature vector *X*. The goal of a decision tree is to divide them into a predictor space *j* distinct and *R*_{1}, *R*_{2}, …, *R*_{J} non-overlapping regions. A decision tree model is optimized when the Residual Sum of Squares error *RSSE* is at the minimum. Given a cutpoint *s*, a recursive binary splits the predictor space into {*X* |*X*_{j} | < *s*} and {*X* |*X*_{j} | ≥ *s* to obtain the minimum *RSSE*.
An optimized decision tree seeks j and s that minimized
Where and are the mean estimated response for the regions *R*_{1} and *R*_{2} respectively. In this study, 0.01, 2 and 4 were selected as the hyperparameter values for minimal gain, minimal leaf size and minimum size for split respectively.

#### 4.2.3 Support Vector Machine (SVM)

The SVM sets up an isolating hyperplane and a maximal margin by picking a subset SV⊂X called support vectors. The optimization problem is given by Equation 32. The SVs are utilized to compute the normal vector W on the hyperplane and the bias b to satisfy the requirement on the optimization problem.
where *y*_{i} is the class to which point *x*_{i} belongs, **w** is the normal vector, and λ is a tuning parameter. During the implementation of this model, we used the dot kernel type, a kernel cache of 200, a convergence epsilon of 0.001, a max iteration of 100000 and a L pos / L neg of 1.0.

#### 4.2.4 K-Nearest Neighboring (KNN)

KNN is known to be basic and simple learning algorithm. This algorithm is executed by looking for the group of K folds, in the nearest training data (similar) objects in new data or testing data. For the most part, the Euclidean distance formula is utilized to characterize the distance between datapoints [39]. The Euclidean distance is mathematically represented as Where d represents the distance, x and y are 2 data points.

For this experiment 10 was chosen as the value of K, in addition we used the MixedMeasures for the measure types.

#### 4.2.5 Neural Network (NN)

A Neural Network (NN) is a set of interconnected artificial neurons networked after the human brain. It has the capability of pattern recognition and knowledge discovery in a dataset. In a simplest form, it consists of a neuron (perceptron). A more complex network comprises of several layers of neurons. The multilayer feedforward network or multilayer perceptron is partitioned into three layers: the input layer, the hidden layer, and the output layer [7]. For this study, our NN comprises of 2 hidden layers with a training cycles of 200, a learning rate of 0.01, a momentum of 0.9, and an error epsilon of 1.0E-4.

### 4.3 Performance *Comparison*

For a better analysis and model comparison of the results of our experiment, we will evaluate their performances using the following criteria.

#### 4.3.1 Root Mean Squared Error (RMSE)

It is a standard deviation of the residual’s prediction errors, defining the spread of the residual, and how concentrated the data is around the line of best fit.
Where *n is the number of examples, y*_{i}, and *ŷ*_{i} are true and predicted values of the response variables respectively.

#### 4.3.2 Relative Error (RE)

When used as a measure of precision, it is the ratio of the absolute error of a measurement to the measurement being taken [8]
where |*y*_{i} − *ŷ*_{i} | is the absolute error, |*y*_{i} | is the actual value, and n the number of records.

#### 4.3.3 Absolute error (AE)

It is the difference between the measured or inferred value of a quantity *x*_{0} and its actual value x.

#### 4.3.4 Relative Error Lenient (REL)

The average of the absolute deviation of the predicted value from the actual value divided by the maximum of the actual value and the prediction. [8]
where |*y*_{i} − *ŷ*_{i} | is the absolute error, |*y*_{i} | is the actual value, | *ŷ*_{i} | is the predicted value and n the number of error.

#### 4.3.5 Relative Error Strict (RES)

The average of the absolute deviation of the prediction from the actual value divided by the minimum of the actual value and the prediction. [8]
Where |*y*_{i} − *ŷ*_{i} | is the absolute error, |*y*_{i} | is the absolute actual value, | *ŷ*_{i} | is the absolute predicted value and n the number of error.

#### 4.3.6 Root Relative Squared Error (RRSE)

It is relative to what it would have been if a simple predictor had been used, the relative squared error takes the total squared error and normalizes it by dividing by the total squared error of the simple predictor
where *y*_{i} the taken measure and *ŷ*_{i} the prediction value.

#### 4.3.7 Squared Error (SE)

It is the average squared difference between the estimated values and the actual value.
where *y*_{i} the actual value and *ŷ*_{i} the prediction value

#### 4.3.8 Correlation

Correlation shows a linear relationship between two variables.
where *R*_{xy} is correlation coefficient, *x*_{i} are the values of the x variable in a sample, is mean of the values of the x variable, *y*_{i} are the values of the y-variable in a sample, and is the mean of the values of the y variable.

#### 4.3.9 Squared Correlation

It represents the proportion of the variance for a dependent variable that’s explained by an independent variable or variables in a regression model
where *R*^{2} is the Squared correlation, RSS= Sum of squares of residuals, TSS= Total sum of squares

#### 4.3.10 Prediction average

It is the average of all the predictions. All the predicted values are added, and the sum is divided by the total number of predictions.

, where σ = prediction, and *nσ* is the number of predictions.

Using the above criteria, table 5 shows the performance comparison of the models.

As it is shown in Table 5, based on all evaluation criteria, the Decision Tree model, which has the lowest RMSE = 0.83, is the model that fits best for this study. The implementation of these models was done in Python, Tableau and RapidMiner.

## 5. CONCLUSION

In this study we have analyzed the impact of COVID-19 in the African American community. Time series analysis were used to show the disproportionality in COVID-19 impact. Visualization of the trajectory of the coronavirus pandemic was shown using area graphs. For a better understanding of the time series, timelines were shown in months as well as in quarters of 2020. We studied the trajectory of total cases, Black cases, total deaths, and Black deaths. The time frame of our work spans March 13 to December 16, 2020. We computed the COVID-19 Health Care Disproportionality for the time frame.

Furthermore, we designed, developed and evaluated eight COVID-19 forecasting models using Holts and Holts-Winters exponential smoothing forecasting methodologies. Forecast was made for the total cases, Black cases, total deaths and Black deaths. Using MAPE, we built a model selection table containing the best forecasting results. A forecast table was then built for total cases, Black cases, total deaths and Black deaths showing COVID-19 health care disproportionality for each state. The results of our forecast modeling suggests that COVID-19 Health Care Disproportionality will continue to the end of the first quarter of 2021.

As shown in the study, our experimental result suggests that there exists a strong evidence of COVID-19 disproportional impact in the Black community. We argued that a universal modelling of COVID-19 fatality rate in the US may be inadequate to predict fatality rate in the Black community. Therefore, a fatality rate predictive model was designed, developed and evaluated specifically for the Black counties. Based on the US ethnic composition, we assumed a Black county to be a county with at least 45% Black population. Decision Tree, Support Vector Regression, Neural Networks, K-Nearest Neighbors and Ridge Regression learning algorithms were trained and evaluated. The outcome of our experiment showed that the Decision Tree model had the best performance in predicting fatality rates in a Black county.

## 6. IMPLICATION OF STUDY

This study has the following implications:

The coronavirus pandemic disproportionally impacted the Black community.

Decision Tree has the best performance in modelling fatality rate in the Black Community. The tree suggested that Black and senior citizens with pre-existing condition living in Georgia State are the most vulnerable.

If healthcare disproportionality continues, the impact of the next pandemic in the Black community should be a concern to all stakeholders.

## 7. LIMITATION OF STUDY

This study has the following limitations.

Study was conducted in December 2020, before the introduction of vaccine. Rate of vaccination in each state will have a major effect in the accuracy of our model.

Since study was limited to selected states and counties, experimental results may be different in states that do not have a large population of African Americans.

## Data Availability

Dataset is available at COVID Racial Data Tracking Project, CDC, US Census Bureau and John Hopkins COVID 19 Repository

https://covidtracking.com/race

## 8. AKNOWLEDGMENTS

This work is funded by the National Science Foundation grant number 2032345.

## Footnotes

1 timothy.oladunni{at}udc.edu

2 sourou.tossou{at}udc.edu

3 max.denis{at}udc.edu

4 eososanya{at}udc.edu

5 joseph.adesina{at}nwu.ac.za