The Mathematics of Covid testing

As England approached the end of its latest lockdown last weekend, the Government went to great lengths to urge continued caution.  They have known for some time that, paradoxical though it may seem, the most dangerous time for the spread of the disease is when infection rates are starting to move lower and restrictions are eased, as people lower their guard and rush to resume a more normal life.  Hence all the talk a week ago about still needing to take extreme care, still needing to obey the regulations and observe social distancing.

And hence too in particular the urgent calls “to protect the NHS”, and dire warnings that it was still “at serious risk of being overwhelmed”.  This message was carefully chosen by Westminster;  the NHS is widely revered in the UK and appeals to protect it were seen by the Government as possibly the best way to get the seriousness of their message home and to ensure public compliance with the ongoing restrictions.

The messaging was not without its problems though.  Leaving aside those who responded that “the NHS is there to protect us not the other way round”, there is the awkward fact that NHS bed occupancy is not particularly high for the season, and that in most of the country there is ample capacity and no real sign that physical resources in the hospitals are at breaking point.

But in the Government’s defence, there is one NHS resource which is under great pressure.  And that is the availability of front-line NHS staff.  At any given time a significant number of hospital workers are at home self-isolating after a positive Covid test – and with self-isolation being for 14 days, with no early return for those testing negative after a subsequent test, this is indeed causing the NHS capacity concerns.

Of course NHS staff, by the nature of their work, will come into contact with those suffering from Covid more than most of us, and a higher incidence of catching the disease themselves is one of the expectable, indeed inevitable results.  But even so, the number unable to go in to work is surprisingly high.  And the reason for this is a mixture of political choice and the underlying mathematics of testing.

Like many tests, the test for Covid is trying to determine an unobservable fact, in this case whether or not the person being tested has the disease, by testing for an observable one, in this case the presence of various indicators.  And this leads to two possible ways in which the test can give the wrong answer:  it can say that someone who is clear has the disease (a false positive), and it can say that someone who does in fact have the disease is clear (a false negative).

The two types of error, saying someone has a condition when they do not, and saying someone does not have it when they do, are called Type 1 and Type 2 errors by statisticians, and there is no easy solution to the issues they raise for any test which is non-trivial.  In particular, it is usually not possible to recalibrate a test so as to reduce both errors simultaneously.  One can always fine tune the trigger levels of a test to reduce the number of cases of one of the two types of error, but this usually comes at the expense of increasing the other – for example in this test one could reduce the number of false positives by increasing the level of viruses one needed to detect before declaring a patient infected, but this would increase the number of false negatives (infected people passed as clear because they were not infected enough).

Given this swings and roundabouts effect, one usually tries to optimise a test by minimising the combined incidence of both errors – or rather minimising the cost of the combination of the two errors.  This then requires an assessment of the relative damage done by a false positive and a false negative, which is where science gives way to political judgment.

Throughout the crisis, the Government’s instinct has been to prioritise the health issues.  And given this, the political assessment has been that false positives are not very costly in health terms (they merely require the person to self-isolate unnecessarily) and false negatives are potentially very costly indeed (they allow someone with the disease to resume normal life and so be a risk to others).  So the natural bias of the politicians has been to ask the testers to set the trigger levels for testing positive quite low.  This would result in very few people being given the all-clear when they have the disease (as desired), but more people being told they have the disease, and so have to self-isolate, when in fact they don’t.

It is important to note that this is a political choice not a scientific one:  the Government may claim to be “following the science” but the science is neutral on which type of error should be minimised.  A different emphasis from the Government, for example to prioritise economic issues, might lead to more pressure on the testers to avoid false positives and the resulting self-isolations, and a different set of trigger levels for the test.

Many statistical tests display this element of choice.  A good example is the level of evidence required for a guilty verdict in criminal law.  This too has both types of error – guilty people being found innocent, and innocent people being found guilty – and in this case the damage done to the innocent individual who is falsely found guilty is so much higher than the damage done by acquitting a felon that in most countries the law sets the test to be very heavily biased towards finding people innocent unless there is no reasonable doubt of their guilt.  But it does undoubtedly mean that lots of people are acquitted who shouldn’t have been.

A more interesting example, because different societies have come to different conclusions, is the prevention of people driving while drunk.  Society would like to stop intoxicated people from driving.  But intoxication and so unfitness to drive is a matter of degree not an absolute yes/no decision, and is in marginal cases difficult to observe.  And the various tests can only measure observables, such as the amount of alcohol in someone’s blood, usually quoted as milligrammes of alcohol per 100 millilitres of blood.  So the question becomes, where does one set the trigger point for turning the observable (the level of alcohol) into the unobservable (are you safe to drive)?  Too high, and you allow a larger number of people who are unfit to drive onto the roads.  Too low, and you penalise those who are actually safe.

And different societies have set different trigger points.  In much of Europe, the limit is 50mg/100ml, which many countries think a fair compromise between personal liberty and society’s safety.  In Norway and Sweden, the limit is 20, reflecting the greater sense of society, and perhaps a greater acceptance of personal sacrifice for communal gain.  In much of the UK (though not Scotland), the limit is 80, perhaps in this case reflecting a stronger libertarian feeling in the general population.

The important point is that none of these national limits is in any sense “wrong”.  They merely reflect the choice made by each society on where the balance between the two types of errors – sober people being found guilty of driving while drunk, and drunk people being allowed to drive – should be struck.

To return to the Covid tests, a combination of the (still) extremely low incidence of the disease (probably well under 1% of the population), and the emphasis on avoiding false negatives (the political element of the test) has meant that (a) the great majority of people tested do not have Covid, but (b) a significant number of them will be indicated as infected.

How serious is this?  Unfortunately, even if the false positive rate is very low in percentage terms, the mathematics produces a surprisingly large number of such “clear but tested positive” cases.  The table below has three inputs, the infection rate (a sensible estimate is about 0.5%) and two estimates for rates for false positives and false negatives for the main test used in the UK[1].  And the outcome is that while 5,000 people per million can be expected to have Covid, nearly 14,000 will display a positive test (the number in red in the lower table below).  And even worse, the number of those testing positive who actually have the disease is as low as 4,000 people (the number in blue) – or much less than 1 in 3 of them.

In other words, the majority of people who test positive will not in fact be infected.  As the table shows, with this choice of inputs the test gives an “error rate” of over 70% (that is, over 70% of the people testing positive are free of the disease) – and to repeat, this is despite the fact that the quoted false positive rate is quite possibly as low as 1%.

On one level this is a puzzling outcome, a numerical oddity but perhaps little more.  Statisticians and probabilists will recognise it as a result of the mathematics of conditional probabilities, in which p(D|+ve), the probability of having the disease given a positive test, is dependent on all three of the prevalence of the disease and the percentage rates for both types of errors.  But for non-mathematicians, the crucial point is that it stems from the fact that the number of people who do actually have the disease is so low.  Indeed, were one to test a population with no incidence of the disease at all – say New Zealand’s – one might still expect a number of false positives, and in such a case the error rate would be 100%.

But on another level, it poses a real challenge for the NHS, and for the Government.  The NHS understandably wants to test its front-line staff at very regular intervals, both to reassure the staff themselves over their health and to preserve a Covid-free workforce.  And because they want to identify as many carriers as possible, the test has been set to minimise false negatives.  But this results in a higher proportion of false positives, a higher proportion of staff self-isolating … and the pressure on the NHS’s capacity that the Government was at such pains to warn us all about last weekend.


[1]              Figures taken from an internet trawl of medical papers on the subject, of which literally thousands have appeared in the last 8 months.  The trawl was admittedly unscientific, though there does seem to be a good degree of agreement on the order of magnitude of the numbers, especially for the percentage of false positives, the more important rate for this discussion.