Open Data triage: Deciding which data can be safely released
UK Power Networks and ODI Leeds are working together to define what a "Presumed Open" network operator looks like. One of the most important parts of this journey is to assess the DNO's data so it can be released safely and consequently deliver maximum possible value for customers and stakeholders. We are already working to establish new use cases for our data. Defining these user needs is a vital step in the right direction, but we also need a transparent triage process so stakeholders know exactly what considerations are made before releasing data.
There are a number of possible risks associated with sharing some of our more sensitive data:
- Exposure of data relating to critical national infrastructure to actors with nefarious intentions, putting the integrity of networks and systems at risk.
- Release of personal data, with potentially detrimental consequences for customers.
- Publication of commercially sensitive, high value or inaccurate data could result in loss of intellectual property, financial loss and / or reputational damage.
How open can we be?
One simple approach to dealing with these risks would be to simply internalise the data, only releasing it to the outside world where absolutely necessary. This instinct is, however, in direct contrast with the principle of presumed open, and would undermine the incredible value that open data can bring.
It states that data exists on a spectrum, from open (that is to say: clearly licensed and available for all to use) to closed (limited to use within the organisation). The domain of shared data sits between these extremes. Here, data is made available outside the organisation under varying limitations and restrictive terms.
Assessing data publication risks
Put yourself in the shoes of a data manager making an open data release; there is clearly a lot to consider! This is where an Open Data triage process is vital, as it moves the discussion from a simple list of things we can't do to a guide on how we can release as much data as possible while delivering on our responsibilities.
The triage process answers two main questions:
- Positioning: Where does this data release sit on the data spectrum?
- Mitigation: What processing and handling do I need to perform on the data to ensure its safe release?
The first of these questions can be answered in two ways. The first pass might look at rules of thumb that suggest things like: "Datasets containing personal data can only be released to trusted organisations under very specific circumstances and via a sharing agreement". This provides us with a clear starting position. The second pass allows us to be more refined, considering the risk-benefit analysis of sharing the data. We can drill down in to the risk areas identified and map out the actual content of the data (e.g. this dataset contains personal names associated with addresses) and the associated risks.
Consistent risk mitigation
In this way, with a more detailed understanding of our data, we can consider ways of mitigating risks so we can share more data, more openly. It may be that we can release the data by summarising or otherwise anonymising the information contained. Alternatively, it may be that the value of the dataset is contained in the elements that make it risky to publish openly, in which case the data could be released under restrictive terms.
In simple terms, consistently applying a transparent, efficient triage process and offering support to data owners means that more of our data sets can move from closed to open in a safe manner.
The triage process fulfils another critical role: signposting material like guidance or case studies which can help educate data owners at the point of need, increasing their knowledge and understanding. That means that data managers could potentially discard some or all of this "scaffolding" as they become more proficient and confident in releasing data. That said, the data release process itself will evolve over time as regulation, corporate culture and the broader landscape continually evolves. The triage process is a way of ensuring consistency across all data releases.
A transparent triage process
There's a final important aspect here - that the output of the process provides evidence of due diligence. Interestingly, Ofgem has recently been consulting on whether DNOs should treat information created during open data triage as open data. With that in mind, UK Power Networks are planning to openly publish data triage outputs to demonstrate adherence to the presumed open principle, and to contribute to the cross-industry consensus on handling of open data.
Data triage is still in its infancy in the energy industry. As we continue to define a process that works well for UK Power Networks, we're really interested to hear about innovative approaches that other organisations or industries are taking to assess and manage the risks of open data publication. How do you work efficiently whilst remaining open? Do you take a risk-focused approach? How do you balance risk containment with value of the release? What are the tried and tested mitigation techniques and tools do you rely on?
We'd love to hear about your experiences with and approach to triaging open data releases. Get in touch via Twitter @odileeds or via email at firstname.lastname@example.org / OpenData@UKPowerNetworks.co.uk
Head of Delivery
Head of Enterprise Data
UK Power Networks