Project Cygnus - I EPC what you did there

Like one of those compelling novel serialisations, we have another chapter in the scintillating story of Energy Performance Certificate data. You can find the first half of the story in this blog post here, whilst we lay out the second half below. Everything is primed, the data has been dissected, time to put it together in useful ways.


After working our way through the most straightforward research areas, we decided to take a closer look at the documentation of heating systems. We decided on this focus, not only because it is getting colder by the day, but mostly because it is one of the 4 key aspects used to calculate EPC ratings. Because of this, it is one of the few mandatory fields to be filled in during an EPC inspection, making it relatively well documented.

We say relatively because the definition of heating systems and the resultant records vary from a focus on overarching supply systems or fuels, to the actual fixtures like radiators installed in the properties. For now we chose to direct our attention towards the former. We made this decision with England's carbon-neutrality goals in mind, but also (selfishly) in the interest of exploring the opportunities of the Leeds District Heating Scheme. But of course that was easier said than done.

First, we revelled in the beauty of a mandatory field: No missing values! We did of course realise that this was also caused by the extra fixture information. After dropping this, we found that the number of unique values decreased significantly leaving us with 32. Some of these were simply different versions of the same thing and could be grouped together. However, there were a few fundamental issues with the data. The first was that some of the observations only included information on fixtures without giving any information about fuel or overarching heating system. These were grouped as 'unknown.' Then cases that provided fixture information with a hint towards overarching systems (like electric storage heaters) complicated grouping. While these were aggregated by main fuel type groupings in the ONS report on EPCs, we decided to keep them separate to be able to explore their individual impact on energy efficiency in the future. While this decision left us with a somewhat messy visualisation, this appropriately highlights the need for standardisation throughout EPC data creation. The final 14 groupings of heat systems can be seen in the following graph and can be found on our data mapper.

Distribution of heating systems and fuels among most recent dwelling EPCs, based on open EPC data for Leeds
(from October 2008, last updated 30 June 2020).

When reduced to the groupings chosen in the ONS report this graph gets a lot clearer, but there are also some extra assumptions that need to be made. For example this is that all the cases that only include electrical fixtures should be grouped in with electric heating.

Simplified distribution of heating systems and fuels among most recent dwelling EPCs, based on open EPC data for Leeds
(from October 2008, last updated 30 June 2020).

These two graphs show the trade-offs that need to be made when working with EPC data in its current form. Especially when it comes to the column associated with heating system descriptions, this can lead to uncertainties and potentially misleading generalisations. An option to avoid this may be to split the heating system and fixture descriptions into separate columns.

Given the necessary time, resources, and ultimately usable data, this could be connected with heating cost to deliver some valuable insights into the cost of carbon neutrality. Especially in the context of local geography there may be promise in exploring the potential of upgrading heating systems.

Green Deal

Given our interest in environmental efforts, we decided to take a closer look at the documentation of government incentives in the EPC data, to paint a picture of how well they actually work. To do this, we first cleaned and graphed the transaction types column of the EPC data set for Leeds. This is also available on the data mapper.

Most recent reasons for EPC Inspections by Dwellings, based on open EPC data for Leeds
(from October 2008, last updated 30 June 2020).

The visualisation above shows that the most prominent reason for EPC inspections in Leeds were property rentals and marketed sales. But this also includes a range of assessments for government incentives like ECO and FiT, which have been explored by IPPR and Ofgem before. For us, especially the Green Deal related data is interesting since it captures EPC assessments on the same dwellings before and after its implementation, in the timeframe of 2013 to 2015. To compare these, we filtered the data to retain only reasonable observations of duplicated building reference numbers that were collected right before and after the implementation of the Green Deal. This left us with 490 observations. While this is only a small subset of the openly available EPC data for Leeds, there are still some observations that can be made:

Improvement of energy efficiency after the implementation of the Green Deal over the energy efficiency before it, based on an open EPC data subset of Leeds.

This first visualisation shows that the energy efficiency of building generally increases after the implementation of the Green Deal. On average this is by about one band. However, it is interesting to note that there are some cases where the energy efficiency has decreased following Green Deal measures. These and the cases where energy efficiency has increased dramatically invite further exploration to uncover if these are merely outliers or opportunities to inform future policy.

Difference of achieved energy efficiency following Green Deal over the potential energy efficiency before its implementation, based on an open EPC data subset of Leeds.
Difference of current energy efficiency over the potential energy efficiency preceding Green Deal implementation, based on an open EPC data subset of Leeds.

The next two graphs show the difference of original and achieved energy efficiency to the proposed potential prior to the implementation of Green Deal measures. They underline the positive effect of the Green Deal, since the distance to the past potential halved from roughly two bands to only one, and the minimum energy efficiency present has increased.

Here it would again be interesting to look at those cases that have surpassed the old potential to find out what exactly they did right. It would also be useful to determine whether the cases at the lower end of the spectrum have improved at all or if their improvement was insignificant.

These initial insights show that while the EPC data can be difficult to tame it holds potential for the exploration of the effectiveness of government incentives. This, however, could be much more detailed and useful given additional information.

For example, the final set of Leeds-based Green Deal related data used here is only a small subset of the already limited openly available EPC data. This means that there is little room to make representative assumptions about spatial and temporal factors affecting the discovered energy efficiency differences. If the entirety of the EPC data, not only for Leeds but all of England and Wales were to be used there might be potential to reliably pinpoint such trends in Green Deal efficiency. On top of this, this research would benefit from more detailed insight into the local allocation and height of the spend to dwellings that passed the Green Deal assessment. This might offer some explanation for the differences in energy efficiency improvements.

So what happens next?

Making a data-driven tool or visualisation is always a great output. It's shareable, it tells a story, it engages people. But sometimes it isn't always possible because the underlying data can't support that. As frustrating as that can be it is not entirely fruitless. Instead of highlighting an interesting discovery, we can highlight the problems that have been barriers to discovery and then work towards removing those barriers. It might require big, sweeping changes in methods or the introduction of proper standardisation but the benefits of better quality data are bigger and long-lasting. The dive in to EPC data is a great example of this - there are masses of potential and benefits to the data being better. As the Climate Crisis has brought Net Zero goals forward, now is the time to reevaluate EPC data in a greener and more open context, for the benefit of everyone.