Building electricity network geography
We recently started a project with Northern Powergrid - one of ODI Leeds' sponsors - to visualise and share their Future Energy Scenarios (PDF) forecasts for their network. We had a very productive meeting in our space at Munro House where we found out about the details of the electricity distribution network they manage (covering much of northern Lincolnshire, Yorkshire, and the North East) and the data they would like to publish.
Element Energy helped Northern Powergrid to create these scenario-based forecasts and provide, at differening levels of the Northern Powergrid network, yearly estimates for parameters such as numbers of electric vehicles and heat pumps, plus levels of domestic/industrial consumption, together with amount of solar and wind power generation.
We were asked to create heat maps showing each of the modelled parameters at the level of Primary Supply Point. Northern Powergrid's existing map of Primary Supply Points was constructed from the postcode sectors of its customers. This map has quite a coarse resolution with large overlapping areas because single postcode sectors are often fed by multiple Primary Supply Points. For a heat map, overlapping areas are undesirable because different values shouldn't be shown on top of each other. We needed a non-overlapping map.
In the UK, the building blocks of census geography are Output Areas. These were last defined by the Office of National Statistics in 2011. They are used to build up other geographies such as LSOAs, wards, Parliamentary Constituencies, and Local Authorities. They can get down to quite small areas and it seemed sensible to use these as our building blocks too. You can download the generalised clipped boundaries of Output Areas from the ONS's excellent geoportal in a variety of formats for use in GIS software.
The next step involved using some "closed" data; the postcodes of Northern Powergrid's customers. These data turned out to be around 6.5 million postcodes across Yorkshire, north Lincolnshire, and the North East. For data protection reasons, we only had the postcodes together with the distribution sub-stations each was associated with; we didn't have access to any other customer information as nothing else was necessary for this task.
At this stage we started to encounter some of the typical issues of data cleanliness that often come with large, legacy, datasets. Some postcodes were missing (around 5,400), some were not valid or incomplete (around 5,700), and some had been mis-entered at some point in the past (an unknown number). Given that the missing/invalid/incomplete postcodes were a small fraction of the overall dataset we felt happy ignoring those entries.
Cleaning the data
Finding mistaken postcodes was the next task. How do you tell that a valid postcode is actually not the correct postcode for a customer? The first thing we needed to do was to establish what was reliable. Northern Powergrid had provided us with the coordinates/postcodes for its Primary Supply Points and its distribution (or secondary) sub-stations. But these values had also been collected over many years and, before we could use them, we needed to check how reliable they were. Plotting a histogram of the ground distances between distribution sub-stations and Primary Supply Points showed that a number of distribution sub-stations were tens to hundreds of kilometres away from their Primary Supply Points. This showed that individual distribution sub-station locations couldn't always be trusted due to data entry errors. However groups of distribution sub-stations connect upwards to one Primary Supply Point and the group is more reliable than the individuals.
We created a map that drew straight lines from the given location of every distribution sub-station to the given location of every Primary. The expectation was that although some individual lines would be wrong, each Primary should look like a "star burst" with lines radiating outwards unless its coordinates were really wrong. Looking at the map (below) you can see that this is largely the case. In some rural areas the starbursts are quite asymmetric due to the topology e.g. in the Yorkshire Dales. At the coast the starbursts are obviously limited to the coastline. The city of York also has a notable pattern; as you move to the outskirts of the city each starbust gets a stronger "outwards" trend away from the city centre. But in most other places they look "sensible". That is except for one Primary Supply Point group - SESSAY BRIDGE / SOWERBY. This starburst had a "comet-like" appearance suggesting that it was the coordinates of the Primary Supply Point group that were wrong. We were able to fix this one by hand!
Building the map
Returning to the subject of customer postcodes. We converted postcodes to Output Area (2011) codes, using the ONS's excellent lookup table. For each Primary Supply Point we found the total number of customers in each Output Area. Where a Primary Supply Point has fewer than 10 customers in an Output Area we ignored it. That helps from both an anonymisation perspective and in removing mistaken postcodes (assuming the mistakes are few and not systematic)
We merged all the Output Area polygons for each Primary Supply Point using the command line tool ogr2ogr and then included all of those in one GeoJSON map. We also created a non-overlapping Primary-Supply-Point map where we assigned Output Areas to only one Primary Supply Point by picking whichever Primary Supply Point had the most customers in that Output Area. Obviously both are approximations to the real, complicated, geography of electricity distribution - there will be large rural Output Areas where the majority of the area has no electrical connectivity - but they give you the idea.
Creating geographies was one part of making a visualisation and we'll be writing up the visualisation tool in a separate blog post later this week.