Northernlands 2 - Open analytics at the Health Foundation


Why collaboration and working in the open is important in health and beyond


This transcript comes from the captions associated with the video above. It is "as spoken".

Hi my name is Emma Vestesson I'm a senior data analyst at the

Health Foundation and I'm here to

share some of the work we're doing about open analytics

at the Health Foundation.

The Health Foundation is an independent charity committed

to bring about better health and health care for people in

the UK. I'm part of the data analytics team at the Health

Foundation and over the last year or so, we've done some

pretty intense work around making our analysis more

reproducible and open.

So there are big challenges, facing health and

healthcare in the UK and this was true even before covid

happened. But at the same time we're seeing some rapid

innovation and use of analytics and data-driven tech in the

healthcare sector. And this carries huge potential to

address some of these challenges. But it does also

carry quite substantial risks if we're not careful with how it's

implemented and our aim is to ensure that data analytics has a

positive impact in the UK's

health. And we're working for a future where everyone's health

and care benefits from analytics and data-driven technology.

So how are we doing this?

We want data analytics to have a positive impact in the

UK's health and there are several ways in which we're

working to achieve this, so we want to have innovations and data

that tackle some of the big challenges facing the sector.

But we also want to influence the national conversation and

be an independent voice, an expert voice on national policy

and system design.

We offer funding and support for innovative work. Because we

want to support better analytical capability.

And we want to help create networks, communities and

partnerships. So what does open analytics mean to us at the

Health Foundation? So we're very committed to being transparent/open

when we're doing analysis 'cause we want people to be able

to look at what we've done, and ask any questions or

pose any challenges. So we want to actively champion good

analytical practice across the sector and spread and share what

we know. So we started doing this by publishing our code on

GitHub around a year ago and last September we had kind of a

mini launch of our own GitHub page, so we now make all of our

analysis code public by default even when it's work-in-progress

and we're working towards exclusively using only open source

languages. But we don't think that sharing code is enough, so

each repo on our GitHub has got description of what the project

set up to do. Information about the data source, whether or not you

can access that which you should be able to see if its open data.

And what else you need to do to reproduce or analysis so other

things you might install or

packages you need. So for us working openly is an ongoing

project, and we're kind of using our own team to experiment a bit

and test new and more open ways of working which

sometimes can be a bit challenging, but we feel like we

want to understand how we can work openly as a team.

And then start talking to other people about how to do it and

advise and what they can do.

So in In addition to putting things on GitHub we're also

sharing what we did and learned along the way and how

we overcame some of the barriers to working more openly. So

we published a blog post on Medium where we describe the

process we went through as a team to launch our public GitHub

account. And provide a kind of step-by-step guide for how

others can do the same thing.

So the problems we're working

on are kind of a big and really require a more

collaborative approach

to be actually be tackled and we believe that transparency

and working openly and reproducibly are essential to

achieving this. And we see open coding and open data as a good

analytical practice.

There are so many benefits of working openly. So first of

all, sharing our code helps us avoid duplicating work within

the team, but also allows other researchers and analysts to

review and validate and reuse our code. And this is something

we've definitely done for covid where I know I use code from an

anaylst in Nottingham and I know that other people have used our

code. It also makes it a lot easier to share our code and our

work internally so we

know what our colleagues are up to and we can celebrate progress

and it makes it a lot easier to credit people both internally

and externally. I think it also ... openness also just

makes for better science. So there's a permanent record of

the analysis that we've done, and it's reproducible.

I think also personally knowing that my code is going on GitHub

makes me go that extra mile.

Because I just make sure that there are enough comments to

explain what the code is doing and I haven't left things

in there that I don't want anymore and this makes it

easier for other people when they use my code, but it also

makes it a lot easier for future me to go back to a project.

So unsurprisingly, COVID-19 has changed the way we work at the

Health Foundation, but actually we realized that as we're all working

remotely, working openly has become even more important.

We have done a lot of reactive work related to the pandemic using

open data and this has allowed us to inform the national

policy debate and highlight the effect of covdi on

vulnerable groups such as care home residents and you can see

if you have the examples of some charts that we've created using

open data at the Health Foundation.

So even though it was great to use open data for this analysis,

there were some challenges with using open data. And sometimes

we were slightly limited in what we could do, and some of those

problems were related to the data itself, so it might not

have been granular enough, so we would, for example, not often be

able to look at both

sex and age at the same time, and sometimes the data wasn't

updated as frequently as we would have liked so

we would produce some analysis, and even when we

knew that more data would bring value to discussion, we

weren't able to do that.

There are also slightly more technical issues that we had.

So quite often open data isn't shared in a machine readable

format, meaning that there might be multiple tables in one

spreadsheet or there will be random text in places where a

computer wouldn't expect there to be text. It happened a few times

that I went to a website to download a new version of a data

set and it just disappeared and it was being published somewhere

else. And this as an analyst makes it really hard to build a

good workflow because

that data just is no longer there any work you've done to put in

kind of automatic downloads for anything. This is gone, when

there's no consistency. And it also obviously happened that

data changed format, which again poses a big problem for

an analyst because you need to go back and check your code.

But we do think that the people who are sharing data

want us to use it.

Especially when things are moving quickly, it's quite easy

to not share data in maybe the best way. So we wrote a blog

with some thoughts on how to collaborate in a time of crisis

and how to share data in a time of crisis with a focus on how to

make sure that if you're sharing something that people can

actually use it. So we wrote our own blog, but we

also know that there's a lot of good work out there on how

to share open data. ODI Leeds has got some open data tips.

The government statistical service has got some

excellent resources, and the Turin Way, which is slightly

more academic resources, got excellent advice on how to

make your research reproducible.

So a lot of the call of the action is coming from analysts

to kind of improve how we do analysis. But we also think that

data controllers could really, really help speed up the work

and spread open working.

So from a position as someone who actually does

this kind of work.

A lot of us use data managers, Digital ONS and it would just

be really, really good if

we could see some processing code, some reference

data and metadata.

As this would really help us with our work but also being

able to be sign-posted to the other people writing code for

the data that you are looking at would be really useful.

So I've focussed quite a lot on code and technology today, but

we don't think that's enough to collaborate effectively.

Based on the work that we've done internally to work more

openly, but also some of the work that we do

to support analysts in the sector, we have a few things

that we think might be stopping people from working openly.

And we think that the first thing you can do is ask yourself

why is my team not working openly?

And we think a big factor is buy-in for management.

Analysts really to feel safe sharing data and often that

needs to come from somewhere higher up to make sure that

there's that support.

There also needs to be support for the extra work during setup

because when you start working openly, at the beginning it is

a bit of extra work, but it's really worth it and I think in

the long run you're probably

saving time. But also having a community is really important,

so knowing that more people are working the same way provide

some reassurance that even though it's hard, this is this

is a good thing to do.

So if you want to start working more openly and you need to

knowledge that working this way might require a culture change.

So what do we do to support analysts? I've talked a lot about

analysis that we've done but a lot of the work that we do is actually

focussed on supporting analysts, working the health and

social care sector. So so far we've funded 43 projects through

four different rounds - something that we call advancing

applied analytics.

And we've had a lot of workshops with these award holders.

So the support doesn't stop with the funding.

We have these workshops and people come together and

troubleshoot issues they've encountered over the course of

their projects. When covid happened, we also set up a

repository of resources for analysts working to support

health and care during the COVID-19 pandemic, and they were

trying to share different solutions that people have

suggested that are

more open.

So we have a few projects that we just want to highlight.

But probably the biggest one being the NHS-R community,

which was, I think probably our big first investment in this area

and it's an initiative that aims to build a community of

analysts in the NHS and to spread the

use of the R language.

And through this we know that there's interest in the system

to work more openly, and we're hoping that our funding program

encourages an open approach to both sharing methodology and code

I want to end by focussing on where we're going next and have

a call for action. The last few months we've seen some

really remarkable innovation with health data, but we haven't

seen the same thing in social care and this is despite how hard the

pandemic has hit the social care sector. We're doing a few things.

First of all, we're teaming up with Future Care Capital

to do some work around social care. We really think

that open analytics will play a key role in sharing the learning

of these projects. And we're therefore launching a funding

programme in August for projects that can address barriers to

analytical capability and social

care? [inaudible] and if you think you can do some

good work in this area, please join us. Thank you.

  • Emma Vestesson

    Senior Data Analyst
    The Health Foundation

    Emma Vestesson
    © Emma Vestesson 2020

    Emma is a senior data analyst at the Health Foundation where she works on quantitative evaluations of healthcare interventions. Prior to joining the Health Foundation, Emma worked as a senior data analyst for the Sentinel Stroke National Audit Programme (SSNAP). Previous to this she worked as an economic researcher consultant for the World Intellectual Property Organization. She is an organiser for R-ladies London.

    Emma has an undergraduate degree in mathematics and economics from Lund University, Sweden and a Master’s degree in specialized economics from Barcelona Graduate school of Economics, Spain. She is currently is currently doing a part-time PhD in health data science at UCL.


Nothernlands 2 is a collaboration between ODI Leeds and The Kingdom of the Netherlands, the start of activity to create, support, and amplify the cultural links between The Netherlands and the North of England. It is with their generous and vigourous support, and the support of other energetic organisations, that Northernlands can be delivered.

  • Kingdom of the Netherlands