Northernlands 2 - Standards in an emergency
Description
Jeni Tennison from the ODI talks about the need for data standards, not just during a crisis but looking beyond that crisis to the future
Transcript
This transcript comes from the captions associated with the video above. It is "as spoken".
Thanks very much for having me here at Northernlands 2.
My name is Jeni Tennison. I'm Vice President and chief strategy
advisor at the Open Data Institute and this session we're
going to focus on talk about the creation of standards and use of
standards in an emergency situation like COVID-19.
I come from a background of using standards, that was how I
got into IT, into development, into software engineering and
into open data. So my involvement with W3C with the
World Wide Web Consortium really was what put me on the path that
I'm on at the moment. I love standards. Standards, I think
make the world go round. They make our lives so much simpler.
But I've been questioning the degree to which standards are
useful when you have to move fast, and that's the real
question that I want to dig into
today. Standards for data we define as documented, reusable
agreements that make it easier to publish, access and use data
So the crucial thing there is that standards are supposed to
be there to make it easier to work with data. If you have a
standard, then you don't have to think as hard as you would
otherwise about how to exactly publish or use it. You don't
have to go through the same thought processes that
everybody else has gone through
in working out what to do.
And standards aren't just about data formats, and
exactly what schema to use in a CSV file. For example, standards
can be things around identifiers. They can be what
units or measures you use, or what questions you ask.
Standardizing things helps people to understand what you're
talking about. We did a project a while ago on developing
standards that really highlighted how many
different ways standards can be useful, and if you go to
standards.theodi.org you'll see a set of guidance about that.
Now the thing about standards is that they do take time to build,
and crucially, that's because it involves working with and
negotiating with lots of other people who might have their own
ideas about what should be in a standard or how a standard might
be used. It's that time- consuming nature of standards
that makes him so difficult in an emergency situation. In April
we started a project on open data and COVID-19.
Throughout this project, looking at the way in which data has
been used through this emergency, we've noticed a real
lack of standardization. Basically, everyone is doing
their own thing, and that seems unsurprising because of the cost
of that negotiation of the cost of actually developing a
standard that slows things down. But we;re wondering, is there a
point when should start to standardize data? Is it a case
that there's more haste if you don't but less speed overall?
And so to dig into these questions a bit more, we're
sharing today two interviews with people who've had to deal
actually deal with health data during this crisis. First of
all, I'm going to hand over to Olivier, who's our head of R&D
at ODI to interview developers behind track together, which is
a symptom tracking app. And then following that will have an
interview with some of the NHS team about how they've been
using data and the kinds of standards that have come through
that. So first of all, over to Olivier.
Thank you Jenny. So for this interview I'm joined by Rasheed.
and Guy Nakamura of the track together team. We've been
working with the team at track together for a few months now.
They contacted us when we started our project around the
month of April and we had a call for anyone needing help with
data around COVID-19 and they were actually one of the first
organizations joining us. Guy, you've been working on a symptom
tracker app for a few months now, sharing some of this data
collected through your app
with academics. Was that always the goal?
Yeah, I'd say so. Um Rasheed and I start to track together about
three months ago, so when COVID-19 was really first coming
to public attention in the UK. We actually started with every
intention of it being a quick weekend project and this
was ultimately because lockdown measures weren't in place. There
was inadequate testing and neither the government nor the
public really had any idea what this virus meant and what we
were facing. So in light of that lack of information, we built
the first web version of
track together with the ultimate ambition of offering
people better visibility of the disease in their communities,
and with every intent to share this data with both academic
institutions and public health authorities around the world.
Thank you. Rash, so when
we started our collaboration it was around opening data or at
least it became about opening data. You've been now publishing
anonymized, open data from the data you've been collecting
through the app. It's gone through several iterations. I
was curious to hear from you, what would you say were the easiest
and hardest facets of that work? Yeah, so the hardest
part was definitely setting up the entire pipeline from
the raw data sets to the open data sets and understanding some of the
pitfalls in it and understanding what can be shared and what
can't be shared. Sensibility I helped us alot there
understanding that at country level we can show data but
maybe not at post code level, given that within a post code
and then age you having those two together you could
potentially identify a single person. That was one of the issues.
Also we started opening up the data quite late so we
hadn't set up the pipeline. We didn't value open data as much
as we should have from the start. And had we have known we would
have introduced this pipeline with our development pipeline?
So one of the things that was quite easy and going forward
was using GitHub. And understanding with your
guys' help understanding. That's the platform to go forward as a
application engineer, very familiar with GitHub and that was
music to my ears be able to use it Thank you Rasheed. So the
next step in the collaboration that we've had and it was no
longer just about you publishing data but really starting a
coordination with a number of other teams. There are quite a
few symptom trackers not just in the UK but internationally,
and your ambition seems to be to harmonize the work done with
COVID-19 symptom tracking data. Guy, I'd like to hear from you
where did this impetus to do so to do the harmonization
come from? And why did you decide to explore
standardization and harmonization rather than if I
can give a provocation rather than merging all the apps into
fewer apps or just one?
Yeah, of course. I think as we gain traction and the severity
that virus really started to come to light, we realised just how
important the work we've done was. We started to come across
more and more similar initiatives, whether, citizen
government or company leads an each and everyone of them had
very similar ambition to share this data with health
authorities, academics and other various parties. And these
trackers were ultimately popping up everywhere from Brazil to
Vietnam to the UK and US.
Ultimately, while in the UK, collaboration was starting to
take place through the government led project Oasis,
which we are a part of kind of led by the J hub and NHSX.
We always had a much broader goal and a more transparent or
should I say open goal than that of Oasis 'cause that data
is ultimately going to the government. And of course we
hope they put it to good use, but we thought there was a
broader need for this as we saw that the virus wasn't just
affecting the UK, the US. It's a global pandemic. I guess why
standardisation rather than the other approach of merging? I
think there are a few reasons for this. One really was the
speed at which we were all moving. I mean, of course,
there's kind of initially a two man project, but we are much
broader, kind of volunteer team now
Ee were able to deliver really quickly, whereas
governments obviously take time with due diligence and
bureaucracy. So lots of manpower have been invested by various
projects and I think it would have been too tough to ask
to get them to merge into one. Secondly, I think
there was a lot of value in the varied approaches taken by these
trackers. For example, we place a lot of emphasis on our social
media campaign social influencer campaign to grow audience, and
we naturally meant that that would be hitting it different
demographic to say, covid symptom study. And then thirdly
merging all of this into one app would raise a lot of
questions about who owns the data, where is the data sitting
and how it's ultimately going to be used. So I think there's a
huge benefit in having all these
various actors in the space. A quick follow-up question
to you Guy. If standards are important, why would you say?
Why do you think that one hasn't quite emerged yet? Or has one
emerged yet that you're seeing that needs to get people around
it? No, nothing really. Which is why we've ultimately come to the
ODI for help on this.
I think it's been very difficult 'cause there's been so many
unknowns in this. We've never faced this sort of situation
previously, and as I said, there's so many different actors
in the space, so we are just ultimately a volunteer
organization. But then you've got governments and different
third party companies working on this, and I think also we
haven't had the time for that coordination, as you've rightly
pointed out, this does take time, and we've only been kind of
months into the pandemic so far
I'd say that is why there hasn't been one yet, but there
is very much a huge need for it.
Great. So I'm going to follow up with a question to you, Rash
again in the same space about stewardship, and organization,
but more about the long-term, because there there is a case to
be made, perhaps about the fact that symptom tracking was only
useful at the very beginning to try and understand the pandemic
as it was unfolding. What would you say to someone thinking
that symptom tracking is yesterday's news? And who do you think this
data that you've been collecting is useful for in the long run?
What we found with the
project was how the use of the data evolved
Over time and with different parties come into play with a
single set of data. You know the use managed to evolve. Initially
it was a public application that people be able to see data
within their area, sort of mapping happening. That later
evolved when we had J hub NHS involved and some universities
we were able to provide specific data for that research and going
forward I think the
applications are fairly endless. There are
applications in business, I think we've seen the emerging
economies, particularly in education, which is one of my
job. So in education we've seen people move from schools to
homes and I think in the use of business intelligence you
can see how symptoms, overtime, how they changed.
And how that affected children
in schools. And how that shifted an economy. That's just one usage.
I think we'll see a log of this data being used a lot for
business intelligence a lot merge with public health
initiatives. Understanding demographics. So I think those
are some examples. I think we're that's just with this small data
set, I think going forward symptom tracking with our
standards that would get into play. Something I think you get
a framework together to fight any future pandemic any future
disease in the future
effects public health.
I think that that's how it's evolved. So something
small has evolved from
you know, pet projects to researchers to
potential commercial
applications so. That's how it is useful in the long term.
That's all we have time for, but this is a
great way to to finish on all the possible impact of
this data. If it's shared and shared more
consistently, I'm looking forward to working with you
on that Rashid, Guy from track together. Many, many
thanks for your time. Jeni back to you.
Thanks Olivier. Now earlier in the week I spoke to Indra Joshi,
who's director of AI at NHSX and Ming Tang, national director of
data and analytics at NHS England and NHS improvement,
about their experience building the COVID-19 data platform
within the NHS and the wider work also within the NHS needing
to wrangle data.
I wanted to explore with them what they'd learned in that
process, and particularly about whether and how standards are
useful in an emergency.
So Indra set the scene for us. What have you
been building and why?
Hi Jenny, thanks for that question. So essentially when
the pandemic hit the UK, those of you all know about how
we approach data in the UK. Know we have multiple different data
sets that are held in multiple different locations and so one
of the key things we wanted to do and what are the fundamental
key areas to actually ensure everybody understood the numbers
and that the same number was interpreted in the same way.
So what we did was we spoke to
quite a number of people, both across government and
across the NHS and we said look, can we actually bring some of
these slightly disperate datasets together in a way that
will then help some of the strategic decision makers both
in government and in regional teams understand essentially
what the numbers were? So what we did was create a database, for
want of a better word, for different datasets across
England, primarily, so that operational teams could
understand the core fundamentals, such as how many
beds we had. How much oxygen. Ventilators. So that's kind of
in a summary what we've done.
Ming, we often hear that NHS data is a bit of a mess.
To what extent was standardisation or lack of standardisation a
problem in the data that you're bringing together like this?
Thanks, Jenny. It was actually a big consideration. Some of the
things that Indra just mentioned around counting beds and
ventilators and all those things are are strict counts, but we
don't have a lot electronic means to gather that
information automatically, so we had to create data collections.
Therefore, any data collection that you create you have to
make sure you've got a consistent way of counting
things so that the people that were replying are actually
counting them correctly. So we had to think about creating
definitions. How we then use that information to make sure
that we could compare apples with apples. Things that work,
physical equipment, when they weren't counts. We also had to
think about how do we de-identify the information. Collect
patient information so that datasets [?]. It's really
important that we have that information 'cause that gave us
a wealth of experience of how the covid treatment was going or
what was happening to patients. So we had to be very
careful even with the covid regulations that we de-identified
that created a covid specific pseudonym so that we could link
datasets together but without
any of the privacy concerns really, so we had to have a
quite a wide ranging discussion with the IG colleagues to make
sure we standardise how we deidentified data, how we then
processed it, and then allowed linkages. All of that's really
important. And yes, you're quite right. NHS data is usually
quite messy. What we have learned is that the core
reference datasets that we hold like organization codes.
Places like care homes, which were you would think would be
quite easy to have a registry. We didn't have one of those at
the beginning of the pandemic with now do. So we have had to
create a lot of those registries which has been fantastic and
then making sure that the consistency not in just the
definition of counts with definition of when
testing results are happening within 24 hours within 48 hours.
All those kind of clear definitions so that we can
enable consistent analysis was really important as well, so
that's kind of the things we
were looking at. And I think we've learned a lot and I'm
really proud of the pace at which we actually delivered some
of the data and analysis as
part of this experience I guess
And Indra, obviously the pandemic is an international
phenomenon, but we're not seeing very much
standardisation about the way in which data about the pandemic
is being published by different countries. Having
been on the inside, what's your kind of insights?
Why do you think that is?
I think they could. This could go back to the kind of the
wider issues with data collection. I mean quite often
you know, even inputting at source, I'm not going to go
through the kind of usual problems that we have. But you
know, EHRs aren't designed for us to actually input data in a
way that is easy to do and standardise. And also as
Ming mentioned, there's quite a lot of different
interpretations. So for example here in the UK we call our
ventilated beds. O+ or V beds. Now
that's because we're here in England and we've kind of made
that category, but it might not be the same in other countries,
and obviously there's a different language, so I think
there's a huge amount of work still to do around data
standardisation, and we, for one as Ming mentioned. You know, we
found it quite challenging as well, especially when bringing
people from different backgrounds together. For
example, data scientists working alongside clinicians actually to
understand those disperate datasets, and there's quite a
bit of effort going on internationally, such as
communities like the Odyssey Network, which is looking at...
They call them common data models, so countries across
the world can start categorizing and common data
models. We as well are doing quite a lot of work around using
open standards. So again that were kind of encouraging people
to code in a much more unified language. And then I guess.
I guess we will. We all have a role to kind of drive this
forward across multiple channels, both the clinical
community, the research community and as well I would
say the tech community has a big role to play.
Ming, one benefit of standards is that people get familiar with
them and then they can use data that's using those standards
really easily. But when we have a world where we're bringing
together data that isn't in those standards, and people
can't make those same assumptions, what do you think
that people who are producing this data need to do in order to
mitigate the potential for misunderstanding?
I think it's really important that
any data alone actually has very good metadata associated
with that, and those definitions and standards
actually published alongside the data so that it can be
interpreted in a particular way. One of the things that we found
during this pandemic was really using the data,
particularly around modeling and making sure that we had clear
sets of assumptions that we were all using and making sure we
were transparent about those assumptions, both between the
national teams and regional teams and the local teams.
Because there is an opportunity to interpret this quite
differently, even in how we count the number of O+ beds
and what an O+ bed is in different sites they'd be
classified as different things. So making that case it's really
important for the metadata and definitions and then for
modeling. What we found was it created even bigger need for
collaboration. And to do that well we have to have really good
documentation, not just the code- ification of what we've
modelled actually documentation around the clear assumptions
The objective of the model. The hierarchy of the models, their
interdependencies, and really making sure that if we are using
a particular data set, how that data set is refreshed.
Replaced during the period of which you're looking at.
All of this stuff is really important, because otherwise you
can't actually compare and make best use of the analysis that
comes out of it 'cause we were using lot of our modeling to
then drive our operational decision-making. I think that's
that's really important. We were made able to make a case for
lots of local decisions through use of better use of data that
we're all we all tied together in the platform that we used.
And finally Indra, are there any other places where you
found standards being used within the health system to
help respond to COVID-19?
Yeah, I mean quite often when we talk about standards the
first thing that springs to mind are things like
interoperability, standard data standards, and the kind of
technical side. One thing we're very conscious about is there a
wealth of other standards out there, such as the regulations
and we here in the UK have something called Accessibility
standards and clinical safety standards, which I think are
imperative when you're actually building either a device or a
mobile app or remote
monitoring tool for whoever it might be, and the reason
these things are really important to consider is
especially in speed and in haste. Sometimes these things
either get overlooked or they're suddenly a consideration right
at the end of that development
cycle, which fundamentally, when you're looking after people
and I speak as a clinician hat on now is, you know, human care
don't do harm do good. These are things that are fundamental
that we, as clinicians, are trained to believe in, and
sometimes kind of mixed technology and clinical things.
Sometimes those thoughts are afterthoughts and so one of the
things we've done is we've published it what we call a
digital health technology standards which encompasses
quite a lot of these things.
Going from clinical safety to cybersecurity,
understanding privacy. So we've got the GDPR for example. And
also to consider if your product that you're building is a
medical device. Because as some people they know the medical
device regulations are changing here in Europe, and so we do
have to be quite mindful about these things. So I always think
it's important to build these things in versus thinking about
them at the end. And so this is why we've published
standards predominantly
around the wider aspects versus just the more technical side so
people can consider these right from the start of that design
process. Thank you so much Indra and Ming for all of
your insights.
So having done those interviews and looked at the way that
standards are being used across different parts of the
system, it really strikes me that the process of standards
that's useful is the one that involves engaging with the range
of needs from different stake- holders and the process of
exposing and resolving different kinds of assumptions and
differences that you have
in order to get to data that is useful and comparable so that you
can bring it together so that you can aggregate data from
different sources so that you can compare, for example how
different countries are doing.
What we've seen is that right at the moment, aggregators are the
standardisers. So for example, with the track together app and
symptom tracking, they talked about project Oasis a project
that is being run by the NHS to bring together data from a
number of different symptom trackers. And so you create a
basically a common way of viewing data from that range of
different applications. Or if you look at the kind of data
that is being published from countries about cases and deaths
and testing and so forth. Currently, the way that most
people are using that data is through the European Center for
Disease Prevention and Control
Activity that is bringing together 500 different datasets
from different kinds of sources in a process that they don't
actually give very much information about in order to
create those comparison graphs and so on. So those
aggregators; the people that are doing the processing of pulling
together data, aggregating it together are the people who are
in effect creating standard ways of seeing data from, say,
symptom trackers or about cases
and deaths. And they're doing the work of the standards
body if you like. And really what we should be demanding
from those aggregators is that they document the models that
they're coming out with, which naturally will have some
commonalities because they're pulling together data from lots
of different sources and open the code that they're using to
map into those standards. If we, if we do that, then at
least we have a starting point for some standards that perhaps
we could evolve a little bit more
Second thing that strikes me is that data publishers have been
operating in quite heads down way in terms of thinking about
who is going to reuse data, but also what other people are
doing. What other publishers might be doing, and some of that
is justified. Indra talked about how different countries have
very different ways of thinking about and structuring their
information, but it would be a
fairly low effort form of standardization if publishers
just looked at what other publishers are doing and tried
to copy it using the same headers in their CSV files using
the same kind of structures, for example, would give a kind of
gradual convergence towards something that was a bit more
coherent and that people could reuse without having to
understand the depths and complexity of each individual
publication. Again, you know documenting and publishing the
models that publishes are using and assumptions behind the
figures that they're providing would give us a kind of leg up
to something that was a bit more standard in a bit more coherent.
But then the third thing that really strikes me is what are we
missing by not engaging in a proper standards process? We've
seen a number of areas where publishes aren't providing data
because they're not really thinking about who might reuse
it outside their particular bubble. Their particular people
who are shouting loudest in their ears. Right at the moment
we got news about the way in which data published about
testing in the UK
hasn't revealed the full extent of cases in places like Leicester,
and that's really a deficiency in the way in which data is
being published, because people aren't listening to the needs of
organizations outside of their particular bubble. The same was
true around the publication or lack of publication around
ethnicity data around COVID-19 cases and deaths. Having more
people involved in the process
and engaging more widely with potential re-users is one of the
things you do when you're doing standards, but you don't have to
be doing a standard in order to do that engagement. I do think
that some of these gaps would have been highlighted and caught
much earlier on if there had been a more inclusive process and
a more thoughtful process about the way in which data is being
gathered and published.
So my kind of conclusion around standards development. As I say,
I think that the process of standards development is the
thing that gives it its power. The way that involves different
stakeholders. And if you can't do that if you can't do the full
standard process kind of rapidly in an emergency situation, you
can at least talk to other people who might have an idea
about how data could be used and
reused. You can at least look at the way in which other people are
publishing data and aggregators and those who are going through
the exercise of pulling together different kinds of sources can
at least document what they've done and perhaps provide some of
that code as open source so that we can all benefit from it.
But I'd love to hear from you. What do you think should have
been done? What could we do now in order to get better standardised
data to help us
deal with this emergency?
-
Jeni Tennison
Vice President and Chief Strategy Advisor, Open Data Institute
© ODI 2019Jeni Tennison is the Vice President and Chief Strategy Advisor of the Open Data Institute. She gained a PhD in Artificial Intelligence, then worked as an independent consultant specialising in open data publishing and consumption. She was the Technical Architect and Lead Developer for legislation.gov.uk before joining the ODI as Technical Director in 2012, becoming CEO in 2016, and Vice President in 2019.
Jeni sits on the UK's Open Standards Board; the Advisory Board for the Open Contracting Partnership; the Board of Ada, the UK's National College for Digital Skills; the Co-operative’s Digital Advisory Board; and the Board of the Global Partnership for Sustainable Development Data.
Sponsors
Nothernlands 2 is a collaboration between ODI Leeds and The Kingdom of the Netherlands, the start of activity to create, support, and amplify the cultural links between The Netherlands and the North of England. It is with their generous and vigourous support, and the support of other energetic organisations, that Northernlands can be delivered.