Updated on 2020-03-23

Covid19 growth analysis

Simple growth analysis of Covid19 based on Johns Hopkins University data. It examines the speed of the spread across countries and provides a basic prediction model.

See a full notebook to replicate these results. It uses mostly plain Julia with PlotlyJS.

Overview of the data

The data consists of 3 main data time series for each country:

  • total cases

  • recovered

  • deaths

We can also compute another data series based on these, which is the active cases.

active = total - recovered - deaths

In this analysis we will primarly look at active cases and deaths. Active cases give us a good understanding of how affected the society is currently and how much the healthcare system has to deal with. Deaths on the other hand are the most reliable measure, since they are more likely to get detected than caseswhen testing is limited.

We can start by plotting the whole history of active cases to get a gist of the data.

//

We can see the spike in China which shows the effect of containment measures there, but not much more than that. Growth of different countries looks roughly exponential.

Now to see a bit better what is going on, let us align all the countries so that we plot them from the time they get 150 cases. We will also switch the y axis to logarithmic, to better see the exponential growth.

//

Looks like there is something in common across the growth in different countries. On the logarithmic scale we can see that the number of active cases in many countries looks almost like a line, so indeed the growth is exponential there.

It is interesting that almost countries regardless of their population have a similar rate of growth, this can be due to the fact that population has only linear effect on growth which is not significant even with a few centers of spread.

Are the containment measures working yet?

Let us look at some subset of countries that have recently introduced some containment measures.

//

As we can see all of these countries proceed along pretty much the same exponential and none of them have visible signs of slowing down. We need to acknowledge that any containment measure will take some time to manifest its effects, so let us hope we see these effects in the coming days.

Let us now estimate the rate of this exponential growth, is it changing with time?

To make it somewhat intuitive we will use "days to double" as the growth rate. This rate changes a lot from day to day, so we will take the average of last 2 days.

Lower numbers mean faster growth, 0 means that there was not enough data on that day to estimate the growth rate.

//

We can see that the data is pretty noisy, it is hard to tell yet if the doubling periods are growing yet. We can see that for the countries in question the cases double every 1.5 to 4.5 days, which is quite significant.

One concern that is common is that not enough testing is happening, so many more cases are present than reported. We can look at the deaths data directly as well, which is more likely to be accurate. The issue is that since there are less deaths, there is less data to compute statistics.

//

There does not seem to be significant difference in growth rate of cases and deaths. One point to keep in mind is that estimating death rate is something more complex to estimate, since it requires taking into account recovery times and other effects.

Here is a list of the number of days to double for countries with enough data:

CountryDays to double deaths
Brazil1.7
Belgium1.8
Switzerland2.3
Canada2.3
US2.6
Spain2.6
Netherlands2.7
United Kingdom3.0
Algeria3.1
Indonesia3.4
France3.4
San Marino3.6
Germany4.1
Philippines4.1
Italy4.5
Sweden5.0
Japan6.3
Iraq8.2
Korea South8.3
Iran8.6
China215.4

What will happen next?

Given these rates how will the number of cases change going forward? Taking the estimated doubling rates of cases in different countries we can see the following numbers. Starting from 10 days ago and going towards 10 days in the future. Current total numbers in these projections are slightly lower than actual ones, since we take into account only countries with higher number of cases.

//

For each country the prediction formula in Julia is as follows, based on 2 day averaging.

function estimated_rate(history::Vector{Int})
     mean_ratio = mean(history[end-i+1]/history[end-i] for i in 1:2)
     1 / log2(mean_ratio)
end

function predict(history::Vector{Int}, days_ahead::Int)
     rate = estimated_rate(history)
     round(Int, history[end]*2^(days_ahead/rate))
end

Is this projection realistic?

To check how realistic this projection is we can see how it would have performed with older data, while trying to predict cases on the same day. This will not account for any new introduced measures, but it also discounts any additional countries.

Below we can see how this simple model would have predicted the active number of cases today.

//

There is quite significant variance, however it does not seem to be too biased one way or another - given more averaging the line would be almost horizontal. We can also see that it performs quite well 6 days ahead.

Try playing around with the data yourself and let me know if you encounter some new sources.