Latest 7 March 2024

Cory Doctorow: Health data – it isn’t just Palantir or bust

The huge possibilities that NHS health data affords are beset by huge risks. But handing it all over to companies like Palantir isn’t the only option, says Cory Doctorow.

For people who care about evidence-based medicine – and who wants non-evidence-based medicine? – the health data contained in the databases of NHS trusts presents a tantalising treasure: a vast trove of population-scale data. And it’s not just any population, it’s an ethnically diverse group whose subgroups are big enough to offer subtle insights.

There’s just one problem: the same factors that make this data so valuable also make it terribly dangerous. The exciting population-scale insights there in the data are made up of terrifying individual disclosures that, once made, can never be taken back. Data is easy to gather into one place, but once it leaks, it’s impossible to un-leak it.

In the old days, well-meaning – but extremely wrong – researchers proposed that this conundrum could be solved by “anonymizing” health records before handing them over to scientists for analysis. The problem is that anonymisation is very hard in the short term, and impossible in the long term. De-anonymising data is surprisingly easy: if you know Tony Blair’s date of birth (a matter of public record) and the two dates during his term in office in which he was treated for a heart condition (ditto), you can pick him out of any “anonymised” pool of NHS data in seconds, and then discover all those facts about his health that aren’t a matter of public record.

Ask Wes Streeting to rule Palantir out of his NHS technology revolutionWrite to the health secretary

In the short term, there are doubtless many Britons in the NHS data who don’t have readily available re-identification hooks like these. But in the long term, we all do: because other parties will inevitably make disclosures or suffer leaks that can be merged with previously “anonymised” datasets. A leak from TfL, Addison Lee or Uber showing who took which journeys can be automatically cross-referenced with hospital or GP prescribing data to reveal which journeys correlate with which “anonymous” patients who saw a prescriber on those dates, and which address they travelled from to attend those appointments.

Attempts at large-scale anonymisation always end in disgrace. Privacy researchers have made something of a sport of it, showing theoretical proofs that they can pick individuals out of large datasets with three, two, or sometimes just one other piece of identifiable information. Given all this, it’s bizarre that so many medical researchers continue to insist that if they just tried really hard, they’d find a way to anonymise population-data studies without creating population-scale risks. Even more bizarre are the plans to flog NHS data to foreign military surveillance giants like Palantir, with the promise that anonymisation will somehow keep Britons safe from a company that is literally named after an evil, all-seeing magic talisman employed by the principal villain of Lord of the Rings (“Sauron, are we the baddies?”).

I think that medical researchers’ long, one-sided love-affair with anonymisation is down to wishful thinking. There is so much information in the NHS records that could save lives, and the idea that we can’t safely retrieve it is heartbreaking for anyone who truly cares about human wellbeing.

Thankfully, we don’t have to choose between research and privacy. At the start of the Covid pandemic, Dr Ben Goldacre and his team at Oxford created OpenSAFELY, a “Trusted Research Environment” that allows researchers to write programs that analyse NHS data in situ. These programs would be dispatched to run against the data held by NHS trusts, and then the system would return the results to the researchers without ever letting them handle the data – which never left the trusts’ own servers.

OpenSAFELY was a smashing success. Goldacre’s team – and the researchers they served – published more than 60 blockbuster papers that quickly and reliably established badly-needed facts about the pandemic. These insights were critical to the global response to the disease.

Goldacre’s approach was so successful that the Secretary of State for Health and Social Care commissioned him to produce a report explaining how the programme could be expanded into a national research service that would allow researchers to routinely, safely and rigorously pose questions of the NHS’s priceless data without exposing Britons to unmeasurable risks.

The Goldacre Review makes a case for turning our NHS into another kind of national treasure: a font of insights that improve lives and save lives safely, and in a cost-effective manner. This is within our grasp today. The British Medical Association and the conference of England LMC Representatives have endorsed OpenSAFELY and condemned Palantir. The idea that we must either let Palantir make off with every Briton’s most intimate health secrets or doom millions to suffer and die of preventable illness is a provably false choice.

The real choice is this: pursue Trusted Research Environments, a Made-in-Britain strategy that is the envy of scientists the world over, or pursue another round of chumocracy, letting deep-pocketed – and frankly sinister – private intelligence companies take control.

Part of campaign

Protect NHS patient data

View campaign

Protect NHS patient data

Cory Doctorow: Health data – it isn’t just Palantir or bust

Part of campaign

Protect NHS patient data

Stay up to date and be the first to hear about how to take action

Part of campaign

Protect NHS patient data

We use the law to resist hate and bring hope. Injustice is not inevitable.

Stay up to date and be the first to hear about how to take action