Northernlands 2 - Standards in an emergency

Description

Jeni Tennison from the ODI talks about the need for data standards, not just during a crisis but looking beyond that crisis to the future

Transcript

This transcript comes from the captions associated with the video above. It is "as spoken".

Thanks very much for having me here at Northernlands 2.

My name is Jeni Tennison. I'm Vice President and chief strategy

advisor at the Open Data Institute and this session we're

going to focus on talk about the creation of standards and use of

standards in an emergency situation like COVID-19.

I come from a background of using standards, that was how I

got into IT, into development, into software engineering and

into open data. So my involvement with W3C with the

World Wide Web Consortium really was what put me on the path that

I'm on at the moment. I love standards. Standards, I think

make the world go round. They make our lives so much simpler.

But I've been questioning the degree to which standards are

useful when you have to move fast, and that's the real

question that I want to dig into

today. Standards for data we define as documented, reusable

agreements that make it easier to publish, access and use data

So the crucial thing there is that standards are supposed to

be there to make it easier to work with data. If you have a

standard, then you don't have to think as hard as you would

otherwise about how to exactly publish or use it. You don't

have to go through the same thought processes that

everybody else has gone through

in working out what to do.

And standards aren't just about data formats, and

exactly what schema to use in a CSV file. For example, standards

can be things around identifiers. They can be what

units or measures you use, or what questions you ask.

Standardizing things helps people to understand what you're

talking about. We did a project a while ago on developing

standards that really highlighted how many

different ways standards can be useful, and if you go to

standards.theodi.org you'll see a set of guidance about that.

Now the thing about standards is that they do take time to build,

and crucially, that's because it involves working with and

negotiating with lots of other people who might have their own

ideas about what should be in a standard or how a standard might

be used. It's that time- consuming nature of standards

that makes him so difficult in an emergency situation. In April

we started a project on open data and COVID-19.

Throughout this project, looking at the way in which data has

been used through this emergency, we've noticed a real

lack of standardization. Basically, everyone is doing

their own thing, and that seems unsurprising because of the cost

of that negotiation of the cost of actually developing a

standard that slows things down. But we;re wondering, is there a

point when should start to standardize data? Is it a case

that there's more haste if you don't but less speed overall?

And so to dig into these questions a bit more, we're

sharing today two interviews with people who've had to deal

actually deal with health data during this crisis. First of

all, I'm going to hand over to Olivier, who's our head of R&D

at ODI to interview developers behind track together, which is

a symptom tracking app. And then following that will have an

interview with some of the NHS team about how they've been

using data and the kinds of standards that have come through

that. So first of all, over to Olivier.

Thank you Jenny. So for this interview I'm joined by Rasheed.

and Guy Nakamura of the track together team. We've been

working with the team at track together for a few months now.

They contacted us when we started our project around the

month of April and we had a call for anyone needing help with

data around COVID-19 and they were actually one of the first

organizations joining us. Guy, you've been working on a symptom

tracker app for a few months now, sharing some of this data

collected through your app

with academics. Was that always the goal?

Yeah, I'd say so. Um Rasheed and I start to track together about

three months ago, so when COVID-19 was really first coming

to public attention in the UK. We actually started with every

intention of it being a quick weekend project and this

was ultimately because lockdown measures weren't in place. There

was inadequate testing and neither the government nor the

public really had any idea what this virus meant and what we

were facing. So in light of that lack of information, we built

the first web version of

track together with the ultimate ambition of offering

people better visibility of the disease in their communities,

and with every intent to share this data with both academic

institutions and public health authorities around the world.

Thank you. Rash, so when

we started our collaboration it was around opening data or at

least it became about opening data. You've been now publishing

anonymized, open data from the data you've been collecting

through the app. It's gone through several iterations. I

was curious to hear from you, what would you say were the easiest

and hardest facets of that work? Yeah, so the hardest

part was definitely setting up the entire pipeline from

the raw data sets to the open data sets and understanding some of the

pitfalls in it and understanding what can be shared and what

can't be shared. Sensibility I helped us alot there

understanding that at country level we can show data but

maybe not at post code level, given that within a post code

and then age you having those two together you could

potentially identify a single person. That was one of the issues.

Also we started opening up the data quite late so we

hadn't set up the pipeline. We didn't value open data as much

as we should have from the start. And had we have known we would

have introduced this pipeline with our development pipeline?

So one of the things that was quite easy and going forward

was using GitHub. And understanding with your

guys' help understanding. That's the platform to go forward as a

application engineer, very familiar with GitHub and that was

music to my ears be able to use it Thank you Rasheed. So the

next step in the collaboration that we've had and it was no

longer just about you publishing data but really starting a

coordination with a number of other teams. There are quite a

few symptom trackers not just in the UK but internationally,

and your ambition seems to be to harmonize the work done with

COVID-19 symptom tracking data. Guy, I'd like to hear from you

where did this impetus to do so to do the harmonization

come from? And why did you decide to explore

standardization and harmonization rather than if I

can give a provocation rather than merging all the apps into

fewer apps or just one?

Yeah, of course. I think as we gain traction and the severity

that virus really started to come to light, we realised just how

important the work we've done was. We started to come across

more and more similar initiatives, whether, citizen

government or company leads an each and everyone of them had

very similar ambition to share this data with health

authorities, academics and other various parties. And these

trackers were ultimately popping up everywhere from Brazil to

Vietnam to the UK and US.

Ultimately, while in the UK, collaboration was starting to

take place through the government led project Oasis,

which we are a part of kind of led by the J hub and NHSX.

We always had a much broader goal and a more transparent or

should I say open goal than that of Oasis 'cause that data

is ultimately going to the government. And of course we

hope they put it to good use, but we thought there was a

broader need for this as we saw that the virus wasn't just

affecting the UK, the US. It's a global pandemic. I guess why

standardisation rather than the other approach of merging? I

think there are a few reasons for this. One really was the

speed at which we were all moving. I mean, of course,

there's kind of initially a two man project, but we are much

broader, kind of volunteer team now

Ee were able to deliver really quickly, whereas

governments obviously take time with due diligence and

bureaucracy. So lots of manpower have been invested by various

projects and I think it would have been too tough to ask

to get them to merge into one. Secondly, I think

there was a lot of value in the varied approaches taken by these

trackers. For example, we place a lot of emphasis on our social

media campaign social influencer campaign to grow audience, and

we naturally meant that that would be hitting it different

demographic to say, covid symptom study. And then thirdly

merging all of this into one app would raise a lot of

questions about who owns the data, where is the data sitting

and how it's ultimately going to be used. So I think there's a

huge benefit in having all these

various actors in the space. A quick follow-up question

to you Guy. If standards are important, why would you say?

Why do you think that one hasn't quite emerged yet? Or has one

emerged yet that you're seeing that needs to get people around

it? No, nothing really. Which is why we've ultimately come to the

ODI for help on this.

I think it's been very difficult 'cause there's been so many

unknowns in this. We've never faced this sort of situation

previously, and as I said, there's so many different actors

in the space, so we are just ultimately a volunteer

organization. But then you've got governments and different

third party companies working on this, and I think also we

haven't had the time for that coordination, as you've rightly

pointed out, this does take time, and we've only been kind of

months into the pandemic so far

I'd say that is why there hasn't been one yet, but there

is very much a huge need for it.

Great. So I'm going to follow up with a question to you, Rash

again in the same space about stewardship, and organization,

but more about the long-term, because there there is a case to

be made, perhaps about the fact that symptom tracking was only

useful at the very beginning to try and understand the pandemic

as it was unfolding. What would you say to someone thinking

that symptom tracking is yesterday's news? And who do you think this

data that you've been collecting is useful for in the long run?

What we found with the

project was how the use of the data evolved

Over time and with different parties come into play with a

single set of data. You know the use managed to evolve. Initially

it was a public application that people be able to see data

within their area, sort of mapping happening. That later

evolved when we had J hub NHS involved and some universities

we were able to provide specific data for that research and going

forward I think the

applications are fairly endless. There are

applications in business, I think we've seen the emerging

economies, particularly in education, which is one of my

job. So in education we've seen people move from schools to

homes and I think in the use of business intelligence you

can see how symptoms, overtime, how they changed.

And how that affected children

in schools. And how that shifted an economy. That's just one usage.

I think we'll see a log of this data being used a lot for

business intelligence a lot merge with public health

initiatives. Understanding demographics. So I think those

are some examples. I think we're that's just with this small data

set, I think going forward symptom tracking with our

standards that would get into play. Something I think you get

a framework together to fight any future pandemic any future

disease in the future

effects public health.

I think that that's how it's evolved. So something

small has evolved from

you know, pet projects to researchers to

potential commercial

applications so. That's how it is useful in the long term.

That's all we have time for, but this is a

great way to to finish on all the possible impact of

this data. If it's shared and shared more

consistently, I'm looking forward to working with you

on that Rashid, Guy from track together. Many, many

thanks for your time. Jeni back to you.

Thanks Olivier. Now earlier in the week I spoke to Indra Joshi,

who's director of AI at NHSX and Ming Tang, national director of

data and analytics at NHS England and NHS improvement,

about their experience building the COVID-19 data platform

within the NHS and the wider work also within the NHS needing

to wrangle data.

I wanted to explore with them what they'd learned in that

process, and particularly about whether and how standards are

useful in an emergency.

So Indra set the scene for us. What have you

been building and why?

Hi Jenny, thanks for that question. So essentially when

the pandemic hit the UK, those of you all know about how

we approach data in the UK. Know we have multiple different data

sets that are held in multiple different locations and so one

of the key things we wanted to do and what are the fundamental

key areas to actually ensure everybody understood the numbers

and that the same number was interpreted in the same way.

So what we did was we spoke to

quite a number of people, both across government and

across the NHS and we said look, can we actually bring some of

these slightly disperate datasets together in a way that

will then help some of the strategic decision makers both

in government and in regional teams understand essentially

what the numbers were? So what we did was create a database, for

want of a better word, for different datasets across

England, primarily, so that operational teams could

understand the core fundamentals, such as how many

beds we had. How much oxygen. Ventilators. So that's kind of

in a summary what we've done.

Ming, we often hear that NHS data is a bit of a mess.

To what extent was standardisation or lack of standardisation a

problem in the data that you're bringing together like this?

Thanks, Jenny. It was actually a big consideration. Some of the

things that Indra just mentioned around counting beds and

ventilators and all those things are are strict counts, but we

don't have a lot electronic means to gather that

information automatically, so we had to create data collections.

Therefore, any data collection that you create you have to

make sure you've got a consistent way of counting

things so that the people that were replying are actually

counting them correctly. So we had to think about creating

definitions. How we then use that information to make sure

that we could compare apples with apples. Things that work,

physical equipment, when they weren't counts. We also had to

think about how do we de-identify the information. Collect

patient information so that datasets [?]. It's really

important that we have that information 'cause that gave us

a wealth of experience of how the covid treatment was going or

what was happening to patients. So we had to be very

careful even with the covid regulations that we de-identified

that created a covid specific pseudonym so that we could link

datasets together but without

any of the privacy concerns really, so we had to have a

quite a wide ranging discussion with the IG colleagues to make

sure we standardise how we deidentified data, how we then

processed it, and then allowed linkages. All of that's really

important. And yes, you're quite right. NHS data is usually

quite messy. What we have learned is that the core

reference datasets that we hold like organization codes.

Places like care homes, which were you would think would be

quite easy to have a registry. We didn't have one of those at

the beginning of the pandemic with now do. So we have had to

create a lot of those registries which has been fantastic and

then making sure that the consistency not in just the

definition of counts with definition of when

testing results are happening within 24 hours within 48 hours.

All those kind of clear definitions so that we can

enable consistent analysis was really important as well, so

that's kind of the things we

were looking at. And I think we've learned a lot and I'm

really proud of the pace at which we actually delivered some

of the data and analysis as

part of this experience I guess

And Indra, obviously the pandemic is an international

phenomenon, but we're not seeing very much

standardisation about the way in which data about the pandemic

is being published by different countries. Having

been on the inside, what's your kind of insights?

Why do you think that is?

I think they could. This could go back to the kind of the

wider issues with data collection. I mean quite often

you know, even inputting at source, I'm not going to go

through the kind of usual problems that we have. But you

know, EHRs aren't designed for us to actually input data in a

way that is easy to do and standardise. And also as

Ming mentioned, there's quite a lot of different

interpretations. So for example here in the UK we call our

ventilated beds. O+ or V beds. Now

that's because we're here in England and we've kind of made

that category, but it might not be the same in other countries,

and obviously there's a different language, so I think

there's a huge amount of work still to do around data

standardisation, and we, for one as Ming mentioned. You know, we

found it quite challenging as well, especially when bringing

people from different backgrounds together. For

example, data scientists working alongside clinicians actually to

understand those disperate datasets, and there's quite a

bit of effort going on internationally, such as

communities like the Odyssey Network, which is looking at...

They call them common data models, so countries across

the world can start categorizing and common data

models. We as well are doing quite a lot of work around using

open standards. So again that were kind of encouraging people

to code in a much more unified language. And then I guess.

I guess we will. We all have a role to kind of drive this

forward across multiple channels, both the clinical

community, the research community and as well I would

say the tech community has a big role to play.

Ming, one benefit of standards is that people get familiar with

them and then they can use data that's using those standards

really easily. But when we have a world where we're bringing

together data that isn't in those standards, and people

can't make those same assumptions, what do you think

that people who are producing this data need to do in order to

mitigate the potential for misunderstanding?

I think it's really important that

any data alone actually has very good metadata associated

with that, and those definitions and standards

actually published alongside the data so that it can be

interpreted in a particular way. One of the things that we found

during this pandemic was really using the data,

particularly around modeling and making sure that we had clear

sets of assumptions that we were all using and making sure we

were transparent about those assumptions, both between the

national teams and regional teams and the local teams.

Because there is an opportunity to interpret this quite

differently, even in how we count the number of O+ beds

and what an O+ bed is in different sites they'd be

classified as different things. So making that case it's really

important for the metadata and definitions and then for

modeling. What we found was it created even bigger need for

collaboration. And to do that well we have to have really good

documentation, not just the code- ification of what we've

modelled actually documentation around the clear assumptions

The objective of the model. The hierarchy of the models, their

interdependencies, and really making sure that if we are using

a particular data set, how that data set is refreshed.

Replaced during the period of which you're looking at.

All of this stuff is really important, because otherwise you

can't actually compare and make best use of the analysis that

comes out of it 'cause we were using lot of our modeling to

then drive our operational decision-making. I think that's

that's really important. We were made able to make a case for

lots of local decisions through use of better use of data that

we're all we all tied together in the platform that we used.

And finally Indra, are there any other places where you

found standards being used within the health system to

help respond to COVID-19?

Yeah, I mean quite often when we talk about standards the

first thing that springs to mind are things like

interoperability, standard data standards, and the kind of

technical side. One thing we're very conscious about is there a

wealth of other standards out there, such as the regulations

and we here in the UK have something called Accessibility

standards and clinical safety standards, which I think are

imperative when you're actually building either a device or a

mobile app or remote

monitoring tool for whoever it might be, and the reason

these things are really important to consider is

especially in speed and in haste. Sometimes these things

either get overlooked or they're suddenly a consideration right

at the end of that development

cycle, which fundamentally, when you're looking after people

and I speak as a clinician hat on now is, you know, human care

don't do harm do good. These are things that are fundamental

that we, as clinicians, are trained to believe in, and

sometimes kind of mixed technology and clinical things.

Sometimes those thoughts are afterthoughts and so one of the

things we've done is we've published it what we call a

digital health technology standards which encompasses

quite a lot of these things.

Going from clinical safety to cybersecurity,

understanding privacy. So we've got the GDPR for example. And

also to consider if your product that you're building is a

medical device. Because as some people they know the medical

device regulations are changing here in Europe, and so we do

have to be quite mindful about these things. So I always think

it's important to build these things in versus thinking about

them at the end. And so this is why we've published

standards predominantly

around the wider aspects versus just the more technical side so

people can consider these right from the start of that design

process. Thank you so much Indra and Ming for all of

your insights.

So having done those interviews and looked at the way that

standards are being used across different parts of the

system, it really strikes me that the process of standards

that's useful is the one that involves engaging with the range

of needs from different stake- holders and the process of

exposing and resolving different kinds of assumptions and

differences that you have

in order to get to data that is useful and comparable so that you

can bring it together so that you can aggregate data from

different sources so that you can compare, for example how

different countries are doing.

What we've seen is that right at the moment, aggregators are the

standardisers. So for example, with the track together app and

symptom tracking, they talked about project Oasis a project

that is being run by the NHS to bring together data from a

number of different symptom trackers. And so you create a

basically a common way of viewing data from that range of

different applications. Or if you look at the kind of data

that is being published from countries about cases and deaths

and testing and so forth. Currently, the way that most

people are using that data is through the European Center for

Disease Prevention and Control

Activity that is bringing together 500 different datasets

from different kinds of sources in a process that they don't

actually give very much information about in order to

create those comparison graphs and so on. So those

aggregators; the people that are doing the processing of pulling

together data, aggregating it together are the people who are

in effect creating standard ways of seeing data from, say,

symptom trackers or about cases

and deaths. And they're doing the work of the standards

body if you like. And really what we should be demanding

from those aggregators is that they document the models that

they're coming out with, which naturally will have some

commonalities because they're pulling together data from lots

of different sources and open the code that they're using to

map into those standards. If we, if we do that, then at

least we have a starting point for some standards that perhaps

we could evolve a little bit more

Second thing that strikes me is that data publishers have been

operating in quite heads down way in terms of thinking about

who is going to reuse data, but also what other people are

doing. What other publishers might be doing, and some of that

is justified. Indra talked about how different countries have

very different ways of thinking about and structuring their

information, but it would be a

fairly low effort form of standardization if publishers

just looked at what other publishers are doing and tried

to copy it using the same headers in their CSV files using

the same kind of structures, for example, would give a kind of

gradual convergence towards something that was a bit more

coherent and that people could reuse without having to

understand the depths and complexity of each individual

publication. Again, you know documenting and publishing the

models that publishes are using and assumptions behind the

figures that they're providing would give us a kind of leg up

to something that was a bit more standard in a bit more coherent.

But then the third thing that really strikes me is what are we

missing by not engaging in a proper standards process? We've

seen a number of areas where publishes aren't providing data

because they're not really thinking about who might reuse

it outside their particular bubble. Their particular people

who are shouting loudest in their ears. Right at the moment

we got news about the way in which data published about

testing in the UK

hasn't revealed the full extent of cases in places like Leicester,

and that's really a deficiency in the way in which data is

being published, because people aren't listening to the needs of

organizations outside of their particular bubble. The same was

true around the publication or lack of publication around

ethnicity data around COVID-19 cases and deaths. Having more

people involved in the process

and engaging more widely with potential re-users is one of the

things you do when you're doing standards, but you don't have to

be doing a standard in order to do that engagement. I do think

that some of these gaps would have been highlighted and caught

much earlier on if there had been a more inclusive process and

a more thoughtful process about the way in which data is being

gathered and published.

So my kind of conclusion around standards development. As I say,

I think that the process of standards development is the

thing that gives it its power. The way that involves different

stakeholders. And if you can't do that if you can't do the full

standard process kind of rapidly in an emergency situation, you

can at least talk to other people who might have an idea

about how data could be used and

reused. You can at least look at the way in which other people are

publishing data and aggregators and those who are going through

the exercise of pulling together different kinds of sources can

at least document what they've done and perhaps provide some of

that code as open source so that we can all benefit from it.

But I'd love to hear from you. What do you think should have

been done? What could we do now in order to get better standardised

data to help us

deal with this emergency?

  • Jeni Tennison

    Vice President and Chief Strategy Advisor, Open Data Institute

    Jeni Tennison
    © ODI 2019

    Jeni Tennison is the Vice President and Chief Strategy Advisor of the Open Data Institute. She gained a PhD in Artificial Intelligence, then worked as an independent consultant specialising in open data publishing and consumption. She was the Technical Architect and Lead Developer for legislation.gov.uk before joining the ODI as Technical Director in 2012, becoming CEO in 2016, and Vice President in 2019.

    Jeni sits on the UK's Open Standards Board; the Advisory Board for the Open Contracting Partnership; the Board of Ada, the UK's National College for Digital Skills; the Co-operative’s Digital Advisory Board; and the Board of the Global Partnership for Sustainable Development Data.

Sponsors

Nothernlands 2 is a collaboration between ODI Leeds and The Kingdom of the Netherlands, the start of activity to create, support, and amplify the cultural links between The Netherlands and the North of England. It is with their generous and vigourous support, and the support of other energetic organisations, that Northernlands can be delivered.

  • Kingdom of the Netherlands