## I Introduction

The novel corona-virus (COVID-19) is one of the most contagious diseases to have hit our blue planet in the past decades [wu2020new]. In little over four months since the virus was first spotted in mainland China, it has spread to more than 170 countries, infected more than 549,136 people, and taken more than 24,863 lives as on last week of march, 2020.
As governments and health organizations scramble to contain the spread of corona-virus, they need all the help they can get, including from artificial intelligence (AI). Though current AI technologies are far from replicating human intelligence, they are proving to be very helpful in tracking the outbreak, diagnosing patients, disinfecting areas, and speeding up the process of finding a cure for COVID-19. It is noted that the country like Italy, Germany, Spain, etc. suffers due to underestimating the speed of the outbreak and unable to predict the effects on the country. Last few years, AI methods successfully applied in various predictive tasks such as stock value [akita2016deep], sales [ali2018crm], weather [xiao2018data], etc. and also predicting epidemic spread [lu2019epidemic].
[ht](topskip=0pt, botskip=0pt, midskip=0pt )[width=0.45]f1.png
A AI method for predicting country wise risk category combining COVID-19 trend and weather.
In this paper, we have proposed an AI-guided method to predict long-term country-specific risk. The primary challenges of such methods are:

Small dataset:

Majority of the machine learning (ML) algorithms demand a large volume of data for training. It is noted that the COVID-19 dataset is less than 90 days long and it is difficult to design accurate AI methods to train such small volume data.

Uncertain data: The virus is very new to the researchers and majority of the parameters that can be used to predict the outbreak and risk factors are unknown. It is observed that the trend is also different in different countries. Hence, a generic AI tool may not be suitable for tracking all trends. It is also noted state-of-the-art deep neural networks fail because of the uncertainty in the data. This observation encourages us to design shallow and country data specific optimized neural networks.

We have proposed to use local trend data and weather data with a shallow Long Short-Term Memory (LSTM) based neural network combined with a feature selection method and fuzzy rules to predict long term risk of a country (Fig.

I). The country-specific neural network is optimized using Bayesian optimization.Next, we discuss the related works and gaps bridge by the proposed method.

## Ii Related Works

We note three communities of the related work. (A) AI in epidemic researches (B) Researches on COVID-19, and (C) Multivariate regression in AI. These are discussed below:

(A) AI based epidemic researches: Real-time epidemic-forecasting attracts several researchers due to the emerging applicability of the method. Jia et al. [jia2019predicting] proposed a neural network for predicting the outbreak of hand-foot-mouth diseases. Hamer et al. [hamer2020spatio] use ML algorithms to predict spatio-temporal epidemic spareness of pathological diseases. AI tools for predicting outbreak in cardiovascular diseases [mezzatesta2019machine, jhuo2019trend], Influenza [kumar2020outbreak], and epidemic Diarrhea [machado2019identifying] is also proposed. A nice review of the AI application on such a prediction is reported in [philemon2019review]. A collective learning based approach [abdulkareem2020risk] is proposed to identify individual risk. In the last few years, machine learning analysis is used to predict epidemiological characteristics of the Ebola virus(EBOV) outbreak in West Africa [forna2019case] and such analysis is also used in [dallatomasina2015ebola] to assess the risk of Nipah virus. Plowright et al. [plowright2019prioritizing] proposed a surveillance method to monitor Nipah virus in India. Recently, Seetah et al. [seetah2020archaeology] proposed a method for predicting future Rift Valley fever virus outbreaks. The majority of the algorithms use a combined decision-making application using statistical and machine learning methods to predict future growth based on past incident data.

(B) Researches on COVID-19: The recent COVID-19 outbreak attracts many researchers to help and find a way to recover. Rao et al. [rao2020identification] proposed methods to detect COVID-19 patients using a mobile phone. Yan et al. [yan2020prediction] built a predictive model to identify early detection of high-risk patients before they transmitted from mild to critically ill. In recent days, numerous research articles published on epidemic prediction of the corona-virus pandemic [peng2020epidemic, zhao2020preliminary, chen2020data, li2020scaling, hilton2020estimation, kastner2020viewing, jia2020prediction, zhao2020tracking, zeng2020predictions, buizza2020probabilistic]. Researchers focused to designed new paradigm based on AI-driven tools [fong2020finding, santosh2020ai] combining ML algorithms and different modality of data. An improved adaptive neuro-fuzzy inference system (ANFIS) methodology is proposed in [al2020optimization]

. The algorithm is based on an enhanced flower pollination algorithm (FPA) by using the Salp swarm algorithm (SSA) to estimate confirmed cases in the next ten days. Li et al.

[li2020covid] developed a regression model to calculate the exponential growth of COVID-19 infection based on the total number of daily diagnoses cases outside China. Analysts obtained projections from 10 familiar machine learning and statistical ecological niche models in [araujo2020spread] against the examining of large-scale climatology variation.(C) Multivariate Regression in AI:The key point in time series study [dong2019partial]

is forecasting. Time Series analysis for business prediction helps to forecast the probable future values of a practical field in the industry

[moews2019lagged, thomas2019time, lorenzo2019some, bandara2019sales]. The method is also applicable in health to predict the health condition of a person on the last diagnosis data[cui2019prediction]. The method uses a feature attention mechanisms to predict future health risks. Other health areas such as antibiotic resistance outbreaks [jimenez2020feature] and influenza outbreaks [tapak2019comparative, su2019forecasting] are also used multivariate regression models. Different algorithms such as deep neural network [ochodek2020deep, hu2020efficient], long short-term memory model (LSTM) [wen2019real], and gated recurrent Unit(GRU)-based model

[yuan2019novel] are successfully applied in various forecasting. The methods rely on specific less estimation error and running time on artificial network suitable data sets with characteristics of multivariate, sequential and time-series.Gap bridge by our method:

The main challenge of predicting the long term risk of a country is solved by combining different weather data with the daily case data, choosing a feature selector, designing dynamic shallow recurrent neural network (RNN) which is optimized for an individual country, and combining fuzzy rule. It is noted in

[santosh2020ai] that custom network for each sample can be a suitable solution for the data which inspired us to design an optimized network for each country. The problem of insufficient data is solved by choosing an optimized shallow network and the problem of predicting local trends is solved by optimizing the neural networks for individual countries. This introduces a new way to predict an epidemic outbreak and correlate with the risk of a country.## Iii Proposed Model

The proposed framework consists of four modules as shown in Figure III. The modules are (1) feature selection module, (2) network search module, (3) local trend prediction, and (4) a fuzzy rule-based risk assessment module. First, we discuss the background of RNN and then discuss these modules below: [t!](topskip=0pt, botskip=0pt, midskip=0pt )[width=]proposed.png Modules of the proposed framework.

### Iii-a Background

In our proposed method, we propose to use a shallow long short-term memory (LSTM) with a few layers. The LSTM is a variation of RNN like GRU. Fundamentally, an RNN handles the sequence by having a recurrent hidden state whose activation at each time is dependent on that of the previous time. Formally, a set of input , the RNN estimates its hidden state by

(1) |

where is the nonlinear function. The LSTM have an output . The hidden states are updated by

(2) |

where is a bounded function. A general RNN estimates the conditional probability of each input state as

(3) |

LSTM is adaptive and estimates dependencies of different time scales. The commonly used RNN variations such as LSTM uses gate and memory cells for sequence prediction. Initially, LSTM initiates with a forgot gate layer

that uses a sigmoid function combined with previous hidden layer

and current input as:(4) |

where is weight and is the bias. A tanh layer creates candidate value represents a tanh cell as:

(5) |

This information is passed to the next cell as:

(6) |

where also a sigmoid function. Finally, this information passed to the next hidden layers as:

(7) |

where is also a sigmoid function known as output gate. The graphical representation of LSTM is presented in Figure. III-A.

[t!][width=0.45]lstm.png (a) LSTM, is input gates, is forgot gate and is output gate. is cell state and update cell. We have used a similar structure of LSTM modules as the building blocks of the proposed system.

### Iii-B Feature selection

We hypothesize that all the features are not linked with the prediction variable. The data contains 3 main concerns for the risk categorization of a country. Number of cases (), number of deaths (), and number of recovered (). The active case () is calculated by . Features are selected by backward elimination method. We calculate p-value of all features with

using ordinary least squares (OLS) regression. We employ a threshold (0.05) for choosing features. Algorithm

1 demonstrate the method.### Iii-C Network search

Let

are different hyperparameters of a learning algorithm and

are domains of the parameters. The dataset () is divided into train () and test (). The hyperparameter space is . Training data is trained on . The test error is the error on of the parameter . The hyperparameter is optimized for a given dataset () by minimizing:(8) |

We have considered root-mean-square error (RMSE) error on validation set to chose best architecture. Hence, the problem can be defined as:

(9) |

In general, the problem of hyper-parameter search can be very expensive as we need to train and evaluate the dataset for each combination of parameters. Searching algorithms such as random search and grid search are better than manual setup but computationally expensive when we have a large volume dataset and a wide hyper-parameter search space. The methods do not consider the previous outcome to choose the next set of parameters hence the methods to spend most of the time evaluating bad parameters. In our case, the RMSE of a set of parameter () is estimated by the conditional probability . The method selects the predicted set of hyper-parameters that perform best according to the probability. Table I summarized the parameters and the search space used in our method.

Parameter | Description | Distribution/Selection | Values |
---|---|---|---|

Learning rate | Minimum learning rate | Log uniform | 1e-1 to 1e-7 |

Hidden layers | Number of layers in the network | Discrete numeric | 1 to 20 |

Hidden state | Number of memory cell in each layer | Discrete numeric | 1 to 200 |

Activation | Activation in each layer | Category | {ReLu,sigmoid,tanh} |

Batch size | Batch size during training | Discrete numeric | 1 to 10 |

Dropout | Dropout size before dense layer | Log uniform | 0 to 0.5 |

First, individual local weather and COVID-19 trend are used to automatically design the desired neural network. Next, the network optimized for an individual country is used to predict the number of cases (), the number of deaths (), and the number of recovered (). These data are used in the next module to decide the risk of the country.

### Iii-D Fuzzy rule-based risk categorization

The prediction of , , and is used to predict the risk of the country. We define 3 categories of risks (1) high risk (HR), (2) medium risk (MR), (3) recovering (RE). First, we calculate the death rate, new case rate, and recovery rate as:

(10) |

(11) |

(12) |

Next, 3 Gaussian fuzzy membership function is defined to represent the risk measurement of these parameters as shown in Figure 12. The final class of the risk is estimated my imposing rules defined in Table II [t!](topskip=0pt, botskip=0pt, midskip=0pt )[width=]fuzzy.png Fuzzy membership function for death rate, case rate, and recovery rate.

Death rate | Case rate | Recover rate | Decision |
---|---|---|---|

High | High | Low | HR |

Low | High | Low | HR |

High | High | High | HR |

Low | High | High | HR |

High | Low | High | MR |

High | Low | Low | MR |

Low | Low | Low | MR |

Low | Low | High | RE |

## Iv Results and Discussion

We conducted various experiments using different baseline algorithms and our proposed method. We have extensively analyzed the results from different perspectives. First, we present the effectiveness of the feature selection method. Next, we discuss the results of proposed network optimization, and we compared the method with the baselines. Finally, we conclude the article with our findings.

### Iv-a Dataset

We have used COVID-19 dataset^{1}^{1}1https://github.com/datasets/covid-19 consists of date, country, confirmed cases, recovered cases, total death. We have combined the data with weather data ^{2}^{2}2https://darksky.net/

consists of humidity, dew, ozone, perception, max temperature, minimum temperature, and UV. We have considered mean and standard deviation over different cities of a country. The data starts from 22-01-2020 to 10-03-2020.

### Iv-B Feature selection

Here we discuss the results of the feature selection method. It is observed that the active case is chosen by all countries and it is obvious. The second larger chosen feature is Ozone which is a new finding. It is noted that humidity, dew, and temperature are also playing a role in COVID-19 outbreak trends. Figure IV-B shows features selected by the number of countries. [t!](topskip=0pt, botskip=0pt, midskip=0pt )[width=0.48]feature.png Number of country select a particular feature.

### Iv-C Network optimization

We have used 300 iterations with 0.1 added Gaussian noise to the data. The last 10 days data is used for validation and rest is used for training. Each network generated by Bayesian optimization is trained using a maximum of 5000 iterations. We have used 100 epoch delay on validation loss for early stopping. We have used 300 iterations during optimization. During optimization, RMSE is minimized over the validation set. Each country data is individually used to generate the country-specific optimized network. It is observed that majority of the network optimized with a few layers and hidden units with ReLu activation. The distribution of the parameters over all the generated networks are shown in Figure

IV-C. The dropout is chosen zero most of the time. [t!](topskip=0pt, botskip=0pt, midskip=0pt )[width=0.48]opt_param.png Distribution of the parameters of the optimized 170 country specific LSTM.Case study (China): Here we discuss the optimization output of network using China dataset. The optimization ends with 2 number of hidden layers with 178 hidden nodes in each layer. The network optimized in 0.008 learn rate, dropout=0, batch size=1, and ReLu activation method. Figure IV-C(a) shows minimum RMSE over iterations, (b) shows different RMSE over iterations. In (c), (d) the distribution of the number of layers and hidden unit and the distribution of learning rate and batch size is shown. [t!](topskip=0pt, botskip=0pt, midskip=0pt )[width=0.8]china1.png (a) minimum validation accuracy over iteration, (b) loss over iteration, distribution of (c) number of layers and hidden nodes, and (d) learning rates and batch size

### Iv-D Training

Each country-specific network is trained using own case and weather data. Although during optimization, the network is validated by predicting active cases, the same network is used to predict death, recovery, and the current number of cases. The networks are trained using a maximum of 5000 epochs combined with the early stopping mechanism used during optimization. The last 10 days data are used for test and rest is used for training. Figure IV-D shows the training loss over epochs and active case prediction using the network optimized for China. It is noted that the loss curve is iterative lowering during training. [t!](topskip=0pt, botskip=0pt, midskip=0pt )[width=0.8]china2.png Case study of the network training for China. (a) training loss during training, and (b) active case prediction on validation data.

### Iv-E Prediction accuracy

Here we discuss the prediction accuracy of the proposed method. The final fuzzy-rule based classification depends on death rate, case rate, and recover rate

. First, the suitable model chosen for a country is trained to predict these three values. We have calculated root-mean-square error (RMSE) on the validation data to evaluate the methods. We have compared using baseline algorithms such as linear regression, lasso linear regression, ridge regression. A single model is used to predict the values of all the countries. It is observed that such methods perform very poorly due to the small dataset. We have also compared the method with some advance neural networks such as a variation of LSTM combined with a fully convolutional network

[karim2019multivariate], a variation of residual RNN [goel2017r2n2], and GRU [althelaya2018stock]. It is also noted that very deep networks also failed to predict accurately using such a small dataset. Bayesian optimized shallow GRU is performed close to our method. We have used all the features for baseline comparison expect in Bayesian optimization-based GRU. The results are summarized in Table III.RMSE | |||
---|---|---|---|

Method | COVID-19 Case | Recovered | Death |

Liner Regression | 3895.0 | 1951.5 | 247.5 |

Lasso Linear Regression | 3804.5 | 1805.3 | 222.2 |

Ridge Regression | 3900.0 | 1705.2 | 267.3 |

Elastic Net [hans2011elastic] | 3671.3 | 1607.2 | 304.2 |

LSTM-FCNS [karim2019multivariate] | 3293.3 | 747.2 | 211.0 |

Recidual RNN [goel2017r2n2] | 3905.4 | 1207.3 | 178.6 |

GRU [althelaya2018stock] | 3603.3 | 1105.2 | 247.3 |

GRU+Baysian | 2803.2 | 911.5 | 224.2 |

Proposed | 2300.8 | 700.3 | 123.7 |

[t!](topskip=0pt, botskip=0pt, midskip=0pt )[width=]case.png 10 days ahead trend prediction in China, COVID-19 cases (row 1), recovered (row 2), and deaths (row 3).

### Iv-F Risk classification accuracy

A fuzzy rule-based method is used to classify the risk of each country into three classes as discussed earlier (HR, MR, and RE). We have used 10 days ahead to predict such risk classes. The accuracy is calculated in a state-of-the-art manner using a manual ground truth extracted from the trend data.

Figure IV-F shown the confusion matrix of the classification accuracy over 170 countries. It is observed that the method produce relatively lower accuracy of predicting MR class due to the incorrect trend rate prediction. We have achieved 78% average accuracy over all the country-specific dataset. [t!](topskip=0pt, botskip=0pt, midskip=0pt )[width=0.48]conf.png Confusion matrix of the four classes for the risk prediction 10 days ahead for 170 countries.

### Iv-G Computational cost

All the experiments are carried out in Intel(R) Xeon(R) Gold 6154 CPU with 128 GB of RAM and NVIDIA Quadro RTX 6000 GPU of capacity 64 GB. The method utilizes 72 computational hours for feature selection, network optimization, training, and evaluating the method.

## V Conclusion

In this paper, we have proposed a Bayesian optimization guided shallow LSTM for predicting the country-specific risk of the novel corona-virus (COVID-19). We have combined trend data and weather data together to predict different parameters for the risk classification task. We also propose to use the country-specific optimized network for accurate prediction and noted that this is suitable when we have a small and uncertain dataset. Combining the overall optimized LSTMs, we also note that rather deep neural networks, the majority of the cases a small neural network perform well in the data. The method can be useful to predict the long-duration risk of an epidemic like COVID-19.

There is some future avenue of the work. Next, we plan to explore a combination of different modality of data such as flight, travelers, business, tourists, etc. The method can also be used to predict the economical effects of such epidemics.

Comments

There are no comments yet.