Northernlands 2 - Open analytics at the Health Foundation
Description
Why collaboration and working in the open is important in health and beyond
Transcript
This transcript comes from the captions associated with the video above. It is "as spoken".
Hi my name is Emma Vestesson I'm a senior data analyst at the
Health Foundation and I'm here to
share some of the work we're doing about open analytics
at the Health Foundation.
The Health Foundation is an independent charity committed
to bring about better health and health care for people in
the UK. I'm part of the data analytics team at the Health
Foundation and over the last year or so, we've done some
pretty intense work around making our analysis more
reproducible and open.
So there are big challenges, facing health and
healthcare in the UK and this was true even before covid
happened. But at the same time we're seeing some rapid
innovation and use of analytics and data-driven tech in the
healthcare sector. And this carries huge potential to
address some of these challenges. But it does also
carry quite substantial risks if we're not careful with how it's
implemented and our aim is to ensure that data analytics has a
positive impact in the UK's
health. And we're working for a future where everyone's health
and care benefits from analytics and data-driven technology.
So how are we doing this?
We want data analytics to have a positive impact in the
UK's health and there are several ways in which we're
working to achieve this, so we want to have innovations and data
that tackle some of the big challenges facing the sector.
But we also want to influence the national conversation and
be an independent voice, an expert voice on national policy
and system design.
We offer funding and support for innovative work. Because we
want to support better analytical capability.
And we want to help create networks, communities and
partnerships. So what does open analytics mean to us at the
Health Foundation? So we're very committed to being transparent/open
when we're doing analysis 'cause we want people to be able
to look at what we've done, and ask any questions or
pose any challenges. So we want to actively champion good
analytical practice across the sector and spread and share what
we know. So we started doing this by publishing our code on
GitHub around a year ago and last September we had kind of a
mini launch of our own GitHub page, so we now make all of our
analysis code public by default even when it's work-in-progress
and we're working towards exclusively using only open source
languages. But we don't think that sharing code is enough, so
each repo on our GitHub has got description of what the project
set up to do. Information about the data source, whether or not you
can access that which you should be able to see if its open data.
And what else you need to do to reproduce or analysis so other
things you might install or
packages you need. So for us working openly is an ongoing
project, and we're kind of using our own team to experiment a bit
and test new and more open ways of working which
sometimes can be a bit challenging, but we feel like we
want to understand how we can work openly as a team.
And then start talking to other people about how to do it and
advise and what they can do.
So in In addition to putting things on GitHub we're also
sharing what we did and learned along the way and how
we overcame some of the barriers to working more openly. So
we published a blog post on Medium where we describe the
process we went through as a team to launch our public GitHub
account. And provide a kind of step-by-step guide for how
others can do the same thing.
So the problems we're working
on are kind of a big and really require a more
collaborative approach
to be actually be tackled and we believe that transparency
and working openly and reproducibly are essential to
achieving this. And we see open coding and open data as a good
analytical practice.
There are so many benefits of working openly. So first of
all, sharing our code helps us avoid duplicating work within
the team, but also allows other researchers and analysts to
review and validate and reuse our code. And this is something
we've definitely done for covid where I know I use code from an
anaylst in Nottingham and I know that other people have used our
code. It also makes it a lot easier to share our code and our
work internally so we
know what our colleagues are up to and we can celebrate progress
and it makes it a lot easier to credit people both internally
and externally. I think it also ... openness also just
makes for better science. So there's a permanent record of
the analysis that we've done, and it's reproducible.
I think also personally knowing that my code is going on GitHub
makes me go that extra mile.
Because I just make sure that there are enough comments to
explain what the code is doing and I haven't left things
in there that I don't want anymore and this makes it
easier for other people when they use my code, but it also
makes it a lot easier for future me to go back to a project.
So unsurprisingly, COVID-19 has changed the way we work at the
Health Foundation, but actually we realized that as we're all working
remotely, working openly has become even more important.
We have done a lot of reactive work related to the pandemic using
open data and this has allowed us to inform the national
policy debate and highlight the effect of covdi on
vulnerable groups such as care home residents and you can see
if you have the examples of some charts that we've created using
open data at the Health Foundation.
So even though it was great to use open data for this analysis,
there were some challenges with using open data. And sometimes
we were slightly limited in what we could do, and some of those
problems were related to the data itself, so it might not
have been granular enough, so we would, for example, not often be
able to look at both
sex and age at the same time, and sometimes the data wasn't
updated as frequently as we would have liked so
we would produce some analysis, and even when we
knew that more data would bring value to discussion, we
weren't able to do that.
There are also slightly more technical issues that we had.
So quite often open data isn't shared in a machine readable
format, meaning that there might be multiple tables in one
spreadsheet or there will be random text in places where a
computer wouldn't expect there to be text. It happened a few times
that I went to a website to download a new version of a data
set and it just disappeared and it was being published somewhere
else. And this as an analyst makes it really hard to build a
good workflow because
that data just is no longer there any work you've done to put in
kind of automatic downloads for anything. This is gone, when
there's no consistency. And it also obviously happened that
data changed format, which again poses a big problem for
an analyst because you need to go back and check your code.
But we do think that the people who are sharing data
want us to use it.
Especially when things are moving quickly, it's quite easy
to not share data in maybe the best way. So we wrote a blog
with some thoughts on how to collaborate in a time of crisis
and how to share data in a time of crisis with a focus on how to
make sure that if you're sharing something that people can
actually use it. So we wrote our own blog, but we
also know that there's a lot of good work out there on how
to share open data. ODI Leeds has got some open data tips.
The government statistical service has got some
excellent resources, and the Turin Way, which is slightly
more academic resources, got excellent advice on how to
make your research reproducible.
So a lot of the call of the action is coming from analysts
to kind of improve how we do analysis. But we also think that
data controllers could really, really help speed up the work
and spread open working.
So from a position as someone who actually does
this kind of work.
A lot of us use data managers, Digital ONS and it would just
be really, really good if
we could see some processing code, some reference
data and metadata.
As this would really help us with our work but also being
able to be sign-posted to the other people writing code for
the data that you are looking at would be really useful.
So I've focussed quite a lot on code and technology today, but
we don't think that's enough to collaborate effectively.
Based on the work that we've done internally to work more
openly, but also some of the work that we do
to support analysts in the sector, we have a few things
that we think might be stopping people from working openly.
And we think that the first thing you can do is ask yourself
why is my team not working openly?
And we think a big factor is buy-in for management.
Analysts really to feel safe sharing data and often that
needs to come from somewhere higher up to make sure that
there's that support.
There also needs to be support for the extra work during setup
because when you start working openly, at the beginning it is
a bit of extra work, but it's really worth it and I think in
the long run you're probably
saving time. But also having a community is really important,
so knowing that more people are working the same way provide
some reassurance that even though it's hard, this is this
is a good thing to do.
So if you want to start working more openly and you need to
knowledge that working this way might require a culture change.
So what do we do to support analysts? I've talked a lot about
analysis that we've done but a lot of the work that we do is actually
focussed on supporting analysts, working the health and
social care sector. So so far we've funded 43 projects through
four different rounds - something that we call advancing
applied analytics.
And we've had a lot of workshops with these award holders.
So the support doesn't stop with the funding.
We have these workshops and people come together and
troubleshoot issues they've encountered over the course of
their projects. When covid happened, we also set up a
repository of resources for analysts working to support
health and care during the COVID-19 pandemic, and they were
trying to share different solutions that people have
suggested that are
more open.
So we have a few projects that we just want to highlight.
But probably the biggest one being the NHS-R community,
which was, I think probably our big first investment in this area
and it's an initiative that aims to build a community of
analysts in the NHS and to spread the
use of the R language.
And through this we know that there's interest in the system
to work more openly, and we're hoping that our funding program
encourages an open approach to both sharing methodology and code
I want to end by focussing on where we're going next and have
a call for action. The last few months we've seen some
really remarkable innovation with health data, but we haven't
seen the same thing in social care and this is despite how hard the
pandemic has hit the social care sector. We're doing a few things.
First of all, we're teaming up with Future Care Capital
to do some work around social care. We really think
that open analytics will play a key role in sharing the learning
of these projects. And we're therefore launching a funding
programme in August for projects that can address barriers to
analytical capability and social
care? [inaudible] and if you think you can do some
good work in this area, please join us. Thank you.
-
Emma Vestesson
Senior Data Analyst
The Health Foundation
© Emma Vestesson 2020Emma is a senior data analyst at the Health Foundation where she works on quantitative evaluations of healthcare interventions. Prior to joining the Health Foundation, Emma worked as a senior data analyst for the Sentinel Stroke National Audit Programme (SSNAP). Previous to this she worked as an economic researcher consultant for the World Intellectual Property Organization. She is an organiser for R-ladies London.
Emma has an undergraduate degree in mathematics and economics from Lund University, Sweden and a Master’s degree in specialized economics from Barcelona Graduate school of Economics, Spain. She is currently is currently doing a part-time PhD in health data science at UCL.
Sponsors
Nothernlands 2 is a collaboration between ODI Leeds and The Kingdom of the Netherlands, the start of activity to create, support, and amplify the cultural links between The Netherlands and the North of England. It is with their generous and vigourous support, and the support of other energetic organisations, that Northernlands can be delivered.