Friday, May 29, 2020

29/5/20: COVID19 Data: One Hell of a Mess

I haver compiled a summary of all COVID19 data for top 50 countries (all countries with more than 10,000 recorded cases as of May 29, 2020). Here are thee tables. Alphabetically, in 2 tables:

So, here are interesting observations:

  • Out of 50 countries only 11 countries are statistically 'average' or statistically 'normal'. All other 39 countries are statistically distinct from the average. Note: I am using 95 percent confidence level, adjusting for non-normal distribution.
  • Of these 39 countries, 21 countries are performing significantly better than average in terms of pandemic severity (in official numbers terms), and 9 are performing significantly worse.
  • 9 countries present an ambiguous case, when compared to the average.
Key takeaway from this: there is, basically, no point of talking about 'normal' experience with  COVID19 numbers. The system of this pandemic is extremely VUCA - high volatility, uncertainty, complexity and ambiguity of data and data dynamics imply that countries comparatives are at best handled with extreme care and on a case-by-case basis, as opposed to by referencing global averages.

Non-normality of data is severe and should steer analysis toward the median as a more valid (but still poor) central tendency measure, rather than the average.

Incidentally, as an aside, this calls into question all and any linear models that are being fitted to the COVID19 data, as, for example, in the case of the infamously bizarre research from JP Morgan claiming no changes in R0 rates during- and post-lockdowns.

Here is an illustrative case: Russia. Russian stats on COVID19 have been throughly washed through the Western media with usual scepticism and allegations that Kremlin is manipulating the data. Statistically, however, Russia is an outlier that is close to some semblance of a norm (especially considering the median).

Here is a summary:

In other words, Russia is somewhat 'normal' in the number of cases detected per 1 million population, and in death rates per 1 million of population, but 'abnormal' in having low reported death rates per case identified. There are 7 countries amongst top 50 case countries that have lower death rates per 1,000 cases, but statistically, there are 20 countries that are indistinguishable from Russia in terms of deaths per 1,000 cases reported. 

Go figure... the data is a fine mess.

Update: for the sake of explanation, the above 'exercise' using Russia is not to imply that all is 'normal' in Russian stats in some ethical or policy-based sense. It is simply to show that even outliers cases of data, like Russia, can be understood to be 'normal' based on simplistic use of statistics. COVID19 pandemic data across a range of countries is of deeply questionable value due to the lack of standardised methodologies in collecting, identifying and reporting data, due to endogeneity problem in terms of reported cases and tests deployed as well as the quality of tests deployed, due to weaker reporting systems across a wide range of economies, and potentially due to political manipulation of methodologies and reported statistics in a number of countries and sub-national jurisdictions. 

