LATEST ARTICLES

# How statistics lie to you

In 2002, Andrew Pole, a statistician for the American retail store chain Target, devised an algorithm to determine the statistical probability of customers being pregnant based upon their purchasing patterns.

Target would then send coupons for pregnancy related items to customers considered to be highly likely to be pregnant, due to their increased propensity to purchase such goods. Naturally, the algorithm was an immense success for the retail store chain, and not more evident is this in the following story which followed the algorithm’s implementation.

An infuriated man walked into a Target store just outside of Minnesota, demanding to see the store manager. The man then proceeded to complain to the manager that his teenage daughter had been receiving coupons for items related baby-care, which the man was transparently offended by as he perceived this to be encouragement for his daughter to become pregnant.

The manager apologised to the man, and he called the man a few days later to apologise once more, however, this time, the man who was initially angered, was now apologetic. The man detailed to the store manager that he owed him an apology as ‘there had been some activities’ in his house that he was not fully aware of, and now his daughter was pregnant and due in the following August.

The algorithm that Target had administered had managed to predict that a teenage girl was pregnant before her or her father knew, simply because of a statistics-based algorithm. Statistics are a powerful tool.

In today’s world, at the epicentre of reporting on the coronavirus pandemic is the presentation of statistics in order to fulfil an agenda of achieving a certain response from the viewers of these statistics, whether that be one of panic or laxness. This has culminated in the manipulation of statistics on an unprecedented scale, exemplified mostly in coronavirus death toll predictions, particularly in the early stages of the pandemic.

For example, as of the 5th of May, it was reported that the UK’s Covid-19 death toll (32,313) had surpassed that of Italy (29,209) to become the highest death toll in Europe. This resulted in many of those critical of the government’s coronavirus response to call for a public inquiry into the government’s handling of the pandemic.

However, this raises the question, are these statistics reliable and, furthermore, is this demand for an inquiry into the government’s handling of the pandemic really called for?

The short answer is no, or at least not for that reason. Justifying such an inquiry based purely on this empirical data is a grey area, due to the methodologies used for calculating coronavirus death tolls differing greatly between countries. As an illustration, the UK incorporates the deaths of individuals who have ‘Covid-19’ mentioned anywhere in their death certificate.

Contrasting this, the metrics utilised by Spain only incorporate deaths of individuals who had previously tested positive for the virus in their statistics, indicating that if the same metrics were used in the UK and Spain, then Spain’s death toll may well be significantly higher than that of the UK.

In addition to this, using these statistics as a parameter for quantifying the quality of governments’ responses to the pandemic is also not wise. Because of the many different extraneous variables involved in the spread of the virus that are not controlled for in these statistics, some countries are expectedly more predisposed to accumulating a higher number of coronavirus cases and deaths. Some of these variables include population density, median age, movement in and out of countries, number of tests conducted and so on.

Evidently, the surface level coronavirus statistics that are displayed to us do not portray an accurate picture of the state of the world and encourage a rather superficial understanding of this crisis. I would go as far as to say that many of the statistics and graphs presented simply lie to you.

This represents a much larger problem, however.

The seemingly innumerable statistics that are either manipulated or are not truly representative in the media coverage of this pandemic draw attention to how easily statistics lie to the masses. Moreover, the variety of ways in which statistics can be manipulated, and just how often this is done, is frightening.

To give an example of this, in 1995, the UK Committee on Safety of Medicines issued a warning on certain new types of birth control pills, stating that the pills increased the likelihood of life-threatening blood clots by 100%. What this truly meant was that the new birth control pill gave around a 2 in 7000 chance of developing blood clots, as compared to 1 in 7000 for the older generation pill. This drew a great amount of controversy as this statistic could both be represented as an increase by 100% and an increase of 0.014%, with each of these representations having inherent flaws and resulting in two very different reactions from the public.

Visibly, this case exhibits just how problematic statistics can be and how easily they can lie to you to achieve a certain response, potentially leading to many negative consequences, such as in this instance, 13000 unwanted pregnancies in the following year.

Therefore, statistics in essence are an incredibly powerful tool, but can be presented in a misleading manner and can be interpreted in a multitude of ways. Hence, it is vital that we are critical of empirical evidence, just as we are of anecdotal evidence, for we must ensure that statistics do not lie to us.