Northernlands 2 - In conversation with...
Paul Connell & Marc Farr chat with Ben Goldacre, asking him about Open Prescribing and his latest piece of work on Open Safely
This transcript comes from the captions associated with the video above. It is "as spoken".
Hi everyone. Welcome to Northernlands 2
which is pioneering and online
And this is our Open Data Saves Lives session.
This is the first session and I'd like to say welcome to Ben Goldacre
and Mark Farr. In a minute they're going to talk about and tell us
who they are, but before that my name is Paul Connell, founder of
ODI Leeds. So I guess you can say, ODI Leeds is my fault.
So, over to you Mark. Tell us where you are
and what you doing here.
Hi Paul I'm the chief analytical officer for East Kent
hospitals, which is a big acute trust based across
That side of Kent from Margate right across to Ashford,
including Canterbury and various other things, and I'm also the
regional chair for Analytics for Kent and Medway. I get involved in
a whole range of other things with people like yourselves and
Ben and Health Foundation and others, but that's my day job
Ben? My name is Ben Goldacre. I trained as a doctor at Oxford
and London then trained in psychiatry at the Maudsley.
trained in epidemiology, which is kind of applied medical statistics
at London School of Hygiene.
And then I came to Oxford four years ago to set up something
called the data lab and we're quite an odd group in the sense
that we we're truly mixed team of software developers, traditional
academic researchers and clinicians, and we set out to
build live interactive data- driven tools and services as
well as traditional academic research papers. And then I guess
alongside that I also do showing off stuff. So I write a column in the
Guardian. And wrote books called Bad Science and stuff like that.
Fantastic, so odd people showing off. I feel comfortable with that
that's good. So Open Data Saves Lives that came part of ODI Leeds'
strategy last year and it is basically it's Mark's fault.
So mark came and spoke kindly at the Northland one conference.
He put that in my head about Open Data Saves Lives as
something we could use to to engage with people in the health
and social care sector and take all the stuff we've been doing
and apply it to the health sector. But we didn't know how
important it would become in
2020. It was something we started in 2019. We started work
on it. And then since the start this year with the pandemic,
we've been doing a weekly session around data, health and
what we can do to do to help and share. And we've all been
looking on with awe and admiration at the work you've
been doing at Open Safely, Ben.
So tell us about it. Tell us what it does and why it's so amazing.
Oh thanks. Alright, you have to promise me
that we can talk about Open Prescribing at some point,
'cause that is actual open data. Open Safely is interesting
because it's the absolute opposite. It is the most
closed data imaginable, and rightly so. It's individual
patients' full electronic health record. So we've built a very
highly secure and unprecedented electronic health records
analytics platform that's running across the full primary
care data. Pseudonymised. Of 40% of the population. That's
about 20 something million people
And. We wanted to like in order to achieve that we had
kind of various goals. First up, we knew we couldn't do it in the
traditional model of doing a large data extract and
researchers downloading that onto their local hard drive and
then running some state of code or some other code in the kind
of walled garden stats package. We couldn't do that because that
would be monstrously insecure. But it's also inefficient and
expensive to ship 40 billion rows of data, which is what
we're working across.
So instead we built a secure analytics platform inside the
data center of the EHR vendors TPP. So that's where the data
already resides and that brings a number of really important
positive benefits. First of all, it's very cheap and efficient.
You're not shipping 40 billion rows around the country, even
down the fattest N3 pipe or post-N3 is still at the
scale where you'd want to drive. Well, I mean, you're not really
allowed to drive around too much at the moment, but
it's a scale of data where in peacetime you stick a hard drive
on the passenger seat and drive to go and get it.
Secondly, it allows you access to near real time data which is
really important in a pandemic instead of large episodic
extracts. It also brings however enormous security benefits, so
you're already in an ISO27001 compliant environment.
You're in a place where you've got logs of
everything that happens, so you can be certain that everything
is safe and secure, and you are not exposed to the risks of
people doing reidentification attacks on data, which is what
you get with big downloaded data sets. Think everybody understands
about the risks of re-identification attacks or should I explain
them briefly. You've got 47 seconds to do it.
Alright take my wife. So if you want to find my wife in a GP research
data set, even though it's taken her name and her date of birth off,
you just look for somebody had twins in 2013, a baby in 2014
and the event codes moved from London to Oxford around about
2015 and then from publicly available information you found
my wife having then re- identified my wife in a
pseudonymous data set. You then discover everything else that's
attached to her pseudo ID.
Including all of chlamydia episodes after we got married.
Or nocturnal enuresis bed wetting aged 14 or any of these
other things which do not apply to my wife. To be absolutely clear.
But I'm just giving you examples of first of all how
easy it is to identify somebody and also the catastrophe of what
you can discover about them. But we also wanted to go beyond just
having an even a generic trusted research environment.
Because we thought we could do
we could do better, so the most disclosive data, the riskiest
data, is the event level data, where it's like one row and it
says this pseudonym ID on this date at this time had a blood
pressure test and it was 140 systolic over 100 over 70
diastolic. Those kinds of single events. Now when you're doing
epidemiology research. You don't necessarily need all of that
very granular detail. For example, you might want to build
a data set that's one row per patient. Rather than one row per
event. And for each patient you might want to, say, create a
variable that's one if they've had a high blood pressure
reading in the last six months, and zero if they've not.
We didn't want to create a world in which researchers had the full
power of a versatile query language against the most
disclosive data, so instead we built a platform where
researchers never have to do that, never, never should be
able to do that. Never, never do that. So instead you describe
your study population, and that's essentially just a very
stripped down representation of the SQL query that you want to
do. And that runs against the real data to extract your true
data set and then you can run your state of code or your R code
or whatever. And there's lots of really clever fancy bits
in between there. So for example, we wanted to push
forward from a world in which people do electronic health
records research in silos for the most part, never sharing
their code lists, which is what they used to identify outcome
variables or drug exposures or disease exposures, never sharing
code. We wanted to push people towards open ways of working,
but not in a kind of
shaking fists kind of way, but just make it easy for people to
do the right thing. So we built Open Safely in that image.
Anytime you want to run code against the platform would
define a cohort. You have to do that by writing it in a GitHub
repo. You can clone a template right now, and if anybody wants
to at home you can go to Open Safely's GitHub repos and you
can clone that template and build your own study definition.
Then we're also trying to bring sort of computational data
science techniques into being the norm for EHR research.
For example. You write your study definition and you don't just
say I want this outcome variable to be: "Have you got high blood
pressure or not?" You also say "I want this outcome variable to be
high blood pressure." Using the following code lists in the
following date range to say one or zero and I'm expecting the
prevalence of that to be 15% of
the population. We then automatically generate a
simulated data set so it's not synthetic data that respects the
co-segregation of exposure and outcome variables. It's just a
simulated data set, randomly generated, so then you can
develop your stator or R code and it will check on GitHub that it
passes tests. Then you push it to run on the real data and then
you get your summary results out and then somebody manually... well
two people manually look at them to make sure they're not
disclosive, and then they get released out so that the rest of
the team can take a look.
And it's all built
in a glorious giant collaboration, and I cannot tell
you how much fun it's been. It's bordering on a love affair
between our teams, actually, and it feels to me like
a massively missed opportunity to get EHR
software developers working closely with EHR researchers.
It's sort of real no brainer, and actually I feel like it's
as with a lot of things during covid and I hope we can talk
about this in more detail later, it's been great to see
the pace picking up and some unhelpful barriers cautiously
and thoughtfully worked around during the period of covid.
So Mark, you run data at a massive hospital. What's it been like in
a massive hospital dealing with data and covid?
I mean, I thought, like you, it was sort of be beholden on me to
do kind of first line research. So I went to a place in Austria
where everybody caught Covid and then. But the weird, the weird
thing for us is that because a lot of people say to us, well
how do you cope with situations like this? Well hospitals they
sort of do that a lot you know
we're ready for planes to fly into bridges and natural
disasters and oil spills and chemical issues and all this.
So we're quite good at putting up operational Control Center,
producing gold command and Silver Commander, Bronze Command
and all those sorts of things, but it's like nothing else we've
ever seen. And a lot of the issues for me, and I agree
with a lot of what Ben said and I'll give you some examples in a
minute of things that I think are really positive and have
kind of accelerated, but we had
to receive national modelling advice and then work out really
quickly that it didn't really work for us, so then had to
stand up local modeling resources to build models that
were more akin to us. That didn't take sort national
averages and just apply them crudely down into Kent. So we had
a whole stream of modelling work that started really quickly. Now
are we going to put up a military hospital? A kind of
Nightingale for Kent, as it were, and how many ventilators
would we need and where would we get in from? And it all felt
really difficult and we thought we were going to have hundreds
and hundreds of ICU beds and
as of today, I think we've got less than 40 people in an ITU
bed on a ventilator, so we you know we converted theaters
into ITUs which were now converting back the other way,
and it's not, it's not out of any mistakes that people made,
it's just it's been so new that we had to kind of work in
such an agile way with the data particularly on the modelling.
I agree with Ben that we've got some stuff done that's really
positive. For almost two years I've been trying to get the NHS
to agree to share data with the police.
To analyze intimate partner violence.
I felt really strongly about this and there
was a lot of reporting through April in the Sunday papers about
domestic abuse, and you know, calls to sports centers and so on
And we managed to make the case that it's reasonable to
link these two datasets together. Done securely done appropriately
along similar lines to... similar, but different
to what Ben's done. And we got that agreement. We've got that
agreement for three out of four hospitals in Kent within a week,
and I'm going to try and knock the
last one over soon. And we had some initial results back from
the police, just a kind of a, you know, a kind of linking 2
files together level before we even start trying to do any sort
of predictive modeling. And it looks like it's really, really
powerful. The data. So getting stuff done and getting it done
quickly, and sensitively, I completely agree with Ben.
That's been that's been really key for us.
I mean, we've been talking, you know, one of the concepts is how
do we now get that? And how do we write up so you're doing it
in Kent. How does that happen in Carlisle as it just
happens without all of us getting involved and it links to your
point, Ben, it's forcing people to be open in how they access
the data, how they access your
work. And that's fundamentally what Open Data Saves Lives is
about, so I guess we're talking about getting people to use the
web. So when you write a blog post about that mark and we put
the governance framework around it and you write a little story
blog, and a technical blog and you put the code on GitHub, how do
we get that deployed in Carlisle next year without having to
spend a lot of money on consultants which you know
that's the dream, really?
Yeah, I mean we've put a lot of stuff out publicly, so all of
the joint data control agreements that we've had been
through 25 lawyers. Everything that we run through
the committee I chair is all publicly available, so we set
up, new local data sets for Covid, not b'cause the regulators
asked us to, but because it seemed like sensible thing to do.
We shared through your good selves data
dictionaries for all of that we've got through the webinars.
We've got people. Kind of adopting that, and,
you know, giving us advice on where is the latest decent list
of where care homes are because that was something we had to
look at really urgently, really quickly and just getting UPRN
data for care homes would have just been turgid and taken
months and months. And we kind of got that done quite quickly,
Which you think is the best, by the way, Mark. Did you get CQC
care homes list matched to UPRN. Yes, I think that was
and again that was through just contacts of Paul we got to that
list much quicker than we would have done historically, so
that's quite useful. That published data that the CQC do.
We went through geography at Open Data Saves Lives
in the session and it was
how the UK has managed to shoot itself in the foot about
geography and how the... No. I agree, I think there's a
there's some really interesting cultural stuff in there as well.
Like for example, there's often a lot of anxiety and lack of
clarity around licensing for things like geodata, and
that's almost worse than just a hard yes or a hard no to a given
data set, in particular because it let's people who are being
miserly off the hook 'cause they can live in the shadows.
But also it creates anxiety where
you might have one individual in an organization,
he says, look, I've looked at the licence. It's fine.
We can reuse this. It's OK. Don't worry about it and their boss,
who's a generalist and legitimately across 14 different
areas just goes 'Oh God! I don't know you better go and you
better go and get that reviewed by legal' and then reviewed by legal
means it's in a huge queue and reviewed by legal is expensive and
slow and all that. Yes, it's so much easier to say no
We could just rolled over on the IG on the police thing
and the thing I always say, it would be famous last words, but
if you kind of go up into the legality of it and the ICO, it's
about if you're doing a good thing or not. If you boil it down.
The language as you become... actually becomes more loose.
So are you doing something that you would
defend in court and would help you sleep easy in your bed?
So trying to address the risk of domestic abuse
during lockdown feels like a good thing to do. I'm not
building a mailing list for a drug company. I feel really
confident that that's a good thing to do, but it's really
easy in law to say it's all bit difficult actually, and we need
to come up with a way of consenting every single patient
before we do it. You know, we're trying to get the British Red Cross
at the moment to
look after some of the patients that come into our Emergency
Department who've got quite chaotic lives and they're definitely
coming from a good place to try and do it, but I've had to push
and push and push for us to do that, and we now doing it
because in law you kind of drift into 'did we consent them?
did we? Were they able to consent well enough for us to
record that they consented enough?' and I think that's part
of what Open Data Saves Lives is to give a kind of robust defense
against this kind of lazy.
privacy concern angle. Yeah, I think you do need to be,
you know, good people can do bad things for the right reasons and
I think I think there has actually been a lot of
sloppiness around IG over the years. I mean, the thing that I
actually find really dispiriting is that there are a lot of
projects out there which are to my mind
monstrously insecure by design, but which have
ring-binders filled to overflowing with signed-off paperwork saying
that they are all fully legal under IG. And actually,
it's interesting. It cuts both ways.
You can have bad IG that blocks good things. You can have
bad IG that permits very bad things. And I do think
think again there's a really interesting cultural point here
Which is first of all
During this period of turbulence, and it's not, it's
not anarchy, its flexibility. There's a lot of really good
stuff being done on the hoof by people going "OK. I'm going to
pick up the phone. I'm going to talk to the person in the local
authority or the hospital who normally I'd have to go through
15 different layers to get to. I'm just going to go alright.
Hi is that Barbara? I've got an idea. Can we do a thing? And
what I'm really hoping is that
I think there's a lot of enthusiasm in the air for
retaining the best of the new norms that have arisen during
covid in the post covid era and I think it would be really
powerful if that happens. I think the other thing that I'm
really hoping to see is... this really speaks to the issues
around public goods, around incentivising innovation
especially with open data and open source tools.
I'm really hoping to see.
Successful delivery. Not just rewarded in terms of praise or
some kind of monstrously complicated like, "Oh, you're a
Pathfinder. And now we're going to work out how to deploy it
with, with an innovation deployment opportunity grant
that costs more in person. Time to ask for it." Right?
What I'm hoping we will see is
people going OK here's a group of people who delivered
something that's useful. And
we're gonna take the approach of GDS and the best of digital
innovation in the private sector and go you resource teams, not
single pieces of work. And you go: Here's a bunch of people who did
something really useful.
Let's resource them because fundamentally, the big
challenge that I think government and the health
service face is...
You're trying to get a whole bunch of new behaviours
around technical skills happening out in
government in the health service. It's really difficult to do that.
You're struggling uphill against people having
line managers who are generalists who don't understand
the technical work that their immediate reports are engaged in
and all of that.
One of the best things you can do is well is trying to create
the perfect job descriptions and the perfect formal job ladder.
And you could also say right. We found people who are doing
something useful out in the
system. They're definitely delivering 'cause they've got a
proven track record of shipping outputs that worked. Now let's
resource them so that they can be number one taken off any jobs
that don't use the technical skills they have that we want to
see exploding across the system.
Resource them so that other people can go and sit next to
them and watch how they work and learn from them so they can spread
And let's resource them to either produce
essentially propaganda or educational materials, either
themselves or with somebody else. And actually, frankly, I
think the thing we're also really overdue for is a bit of
proper knowledge management. There are professions who were
unfortunately don't have a good name, it they've got names like
librarians or information scientists, but they're not
putting books back on shelves. There are people have given a
great deal of thought into how to curate complex technical
knowledge in a commons of knowledge to help people find
the information they need at the right time to help stop good ideas
getting lost and all of that. When I was when I was allowed
to go to the pub, I used to get drunken rant about
that sort of stuff. All I'm asking for is a coherent
information architecture? Well, do you know what there's
a thing called the web?
And if you put things on the web people can
find it, you know. Imagine if that health architecture, you
know we just said every project should have three things, which
is a story blog which describes what is. A technical blog which
says how we did it and then a repo with the data and code and
you link it all.
And then you said, well, actually I can then find it on the web
and then I can look it up. Well it links to
a point that Paul and I were talking about where one of
the things we found under Covid is that we've been asked to give
loads and loads of people loads of data all the time, but we
never get any of it back.
And Ben, if you and I run neighboring trusts
I don't get to see your data. Which would be really helpful.
Would be really helpful for me to know in Maidstone if your
numbers are going up and mine are going down. So we had to
create all that ourselves. There's also something
really important, I think, Mark
around. Reciprocality. So if you hand over a load of data
you should expect to see the thing that is used for coming
back to you.
Not lock down on NHS England Tableau dashboard
I mean joking aside
if you had a principle where every time you sent some data
a sit rap or whatever you got, everybody else's back. There's
nothing identifiable on any of that and we''ll put some
governance around it, but it'd be really, really powerful if I
knew exactly how many people in every school in Ashford had been
tested and what our rates were as we drifted into Maidstone.
Really simple to do so. We ended up
We did some really good stuff locally without being told or
asked to just link a load of our data together and to create new
datasets that we're all sharing in the open using some of Paul's
tools hopefully, but it's frustrating that you're involved
in this machine of pushing data into a center and nothing useful
comes back. And even if it just went to people like you that
would be a start, but if it kind of came promptly back to us,
as we sent it, that'd be really
really helpful. Open Prescribing which we built in my
group in Oxford is another very good example of that.
[Open] Prescribing's our service that let's any interested person
go and see exactly what each individual GP practice is
prescribing down to the level of individual prescriptions, month
by month, individual practice
level. And Open Prescribing now has 135,000 unique users a year
15-or-so 1000 unique users a month It's got thousands of people
subscribing to alert service that's driven off fancy
statistical process control techniques under the bonnet,
but which is nice and friendly and easy to use
when you subscribe. And
we did that very deliberately because we, you know, I've
worked with EHR data in the past. Rich, disclosive, but
pseudonymised very detailed event level data. I wanted to start
building live interactive tools and services like Open Safely,
but we start with Open Prescribing, without being
shackled and slowed by the phenomenal cost and permissions
which is legitimately around closed data. So that was
the only way that we could possibly have got our group
up and running and using kind of agile collaborative approaches
and freely sharing code on GitHub where we've now got
45,000 lines of code in total. Well, I think more relevantly
1000 issues open and closed
So working with that open data first, it's
probably the richest health data set that's ever been shared as
open data. And we have published
studies showing that in places where Open Prescribing is used
that prescribing improves.
We've got huge numbers of examples of really important
signals that we've detected on
risk, quality and cost effectiveness that simply would
never have been discovered anyway. They were certainly
never detected by the people employed by NHS business
services or authority or NHS England to analyze this data
And why should they be? That's not a criticism of those
organizations, but people working behind closed doors,
but also in limited numbers obviously will never have all of
the perfect answers for every possible user group. For every
possible analytic idea on every
possible aspect. Of every possible dataset. It's just ludicrous
to ever imagine that that's how creative analytics could
possibly work. There's also, I think something really
interesting and political here, which is around where data is and
isn't disclosed. So Mark, you working in an NHS Trust
Hospitals are quite large and powerful
organizations, and there's a really interesting historical
anomaly, which is that...
The GP level data for practice level data. It's basically all
published and shared as open datasets. Compare that with
hospitals. No hospital prescribing data. Medicines
usage data 'cause GP prescribing hasn't really have... no hospital
medicines' usage data is shared as open data. I'm happy
to talk about the reasons behind that. We've got paper hopefully
coming on that in the BMJ soon
All of the model Hospital data and this is the flagship
variation in care national analytics program. Not a single
one of the things that it measures about each hospital,
is shared. Now that would be much, much less disclosive, and
I think it's really, really interesting to think why is
GP data always shared as open data which then creates this
really fertile ecosystem of people doing collaborative
analytics to help? And also done in a really positive spirit like
we just, we almost never see anyone misusing Open Prescribing
in a childish way. We never see people going. Is this the worst
GP in Kent? In fact, we've never seen that, but it's really
interesting. GP data shared hospital data, not. I wonder
if that's a function of their political weight and their size.
We're rabbiting on... God, God, I've got to go yes. But let's finish on
something positive. So we - myself and Mark - we're launching Open
Data Saves Lives as a bit more of a thing this year and that
exactly that point is to help people and give them
permission to share more, do more and also kill reports. So
rather than writing reports and presentations, powerpoints,
build stuff that fixes stuff. So that's what ODI Leeds is doing.
In Kent Mark? What are you doing this so positive and out of all this?
I'm still reveling in my success with the police, but
I've also got everyone to... I've got joint data control, which
is a GDPR function signed up across the whole region, so we
got 2 million people.
20 odd NHS organizations all signed up.
No politics. Really good network. Really good relationships.
Everything we're doing is published openly, so that's
really positive and it looks like we've got some big research
bids that we can announce soon to carry on building a linked
data set at a lower level of depth than has been done. Only
akin to some of the stuff that Ben's doing. Most people are just
linking sus data. It's not that interesting, so it's really
exciting. We've got medical
school opening. We go open a data lab with the University and
the medical school, so I'm really excited about that.
How 'bout you Ben? What's next?
We're expanding Open Safely and were expanding it in various
interesting directions. I'm really, really interested in
seeing a better Commons of knowledge around operational
research, and this is something that me and Mark have talked about
a lot. In fact, we've got a paper coming out in Journal of the
Royal Society of Medicine, I think very, very soon, which is around
how to kind of build
Make cooperational research great
again. The Lancet used to publish audits in the 80s.
I mean it was a normal thing and now it's all done behind closed
doors and there are pockets of greatness and pockets of
drudgery and manual labor in Excel. We need to
make it a thing which has a Commons of knowledge.
A shelf full of textbooks and courses rather than just going
to sit next to Mark Farr for a while and you'll pick it up.
My aim is that that's something that people who are 20
want to do when they come out of
a university with a good maths degree. That's literally what
I'm trying to do. I'm going around universities saying 'This is cool'
You can meet people like Ben and Paul
Jesus you gonna have to do better than that.
On that bombshell I'd like to say thanks very much. It's been amazing.
Goodbye from Northernlands 2. Thank you Ben. Thank you Mark
See you soon
Author, campaigner, broadcaster, doctor
Ben Goldacre is a doctor, best-selling author, academic and campaigner. His work focuses on uses and misuses of science and statistics by journalists, politicians, drug companies and quacks. His book Bad Science reached #1 in the UK non-fiction charts and has sold over half a million copies worldwide. He has published extensively in all major newspapers and various academic journals, and appears regularly on radio and TV from Newsnight to QI. He has written government papers and reports on evidence based policy, founded a successful global campaign for research transparency, and currently works as an academic in the University of Oxford, where he runs the EBMdataLab building live data tools to make science and medicine better, like OpenPrescribing and OpenTrials. His blog is at www.badscience.net and he is @bengoldacre on twitter.
Founder, Beautiful Information
On leaving the management consultancy Experian Marc joined UCL’s Centre for Advanced Spatial Analysis as a senior research fellow. It was there that he made initial contact with Dr Foster, analysing the relationship between geodemography and health outcomes. They were looking at the use of postcode level statistics to standardise hospital mortality rates alongside variables such as age, sex and diagnosis to enable hospitals to be compared with one another.
In 2004 Marc joined Dr Foster as Director of Product Development where he oversaw the company’s development of tools across clinical benchmarking, financial management and health needs mapping. In 2007 Marc was made Honorary Professor at UCL in the field of Geomatic and Civil Engineering. In 2010 he joined East Kent University Hospitals NHS Foundation Trust where he is Director of Information responsible for informatics, coding and clinical systems. Marc is a graduate of the King’s Fund future leaders course and was named in the HSJ Top 50 Innovators in Health 2013.
Founder, ODI Leeds
Paul is an entrepreneur and specialist in innovation, his experience and knowledge means that he has developed a unique skill-set in the field Open Innovation, Data & Smart & Future Cities.
ODI Leeds is a pioneer node of the Open Data Institute. It was created to explore and deliver the potential of open innovation with data at city scale. It works to improve lives, help people and create value.
DataCity is a Data as a Service (DaaS) company that is using Big Data and AI to understand the economy in real time.
Nothernlands 2 is a collaboration between ODI Leeds and The Kingdom of the Netherlands, the start of activity to create, support, and amplify the cultural links between The Netherlands and the North of England. It is with their generous and vigourous support, and the support of other energetic organisations, that Northernlands can be delivered.