At ODI Leeds we have an Open Data Collaboration Group which meets every 6 weeks or so. We often have representatives from Calderdale Council, Leeds Council, Bradford Council, Stockport Council, Barnsley Council, WYCA, and Yorkshire Water to discuss how we can improve open data, share skills, and share lessons learned. Please let us know if you'd like to join in.
At the start of the year the Open Data Collaboration Group decided to start a project on improving Business Rates (NDR) data as this is a dataset that Local Authorities receive a lot of Freedom-of-Information (FoI) requests for. The aim was to see if we could get some useful visualisations and results from the datasets but we immediately ran into a fairly common issue - every Local Authority did it differently.
Making things consistent
Before we could make a tool we needed to get everything into a consistent format. As there didn't seem to be one already, we created a standard. The standard tries to be pragmatic in some ways: not every field is required and extra fields can be included. In other ways it is pretty strict as it insists on ISO8601 dates (YYYY-MM-DD), clean currency values, and exact column headings. However, where we've been strict we've also created some simple web-based tools to help. Latitude and Longitude are required columns so we created a tool to add geography to a CSV file using postcodes. We also made a CSV Cleaner tool that changes dates into ISO8601 format and "fixes" column headings. So, even if a Local Authority has limited skills and software available internally, these tools should help fill the gap.
After we'd defined the standard we started getting West and North Yorkshire Local Authorities to meet it. Every council was starting from a different point. Some councils hadn't previously published any data at all. Others had published limited amounts of data. Some had been reluctant to include locations of businesses or if they were empty and seeing what other Local Authorities were doing helped provide reassurance. To encourage the Local Authorities to do better, and to credit them when they were doing well, I created a "League Table" that scored them on eight criteria. This encouraged them to meet the standard and also to publish their data in an accessible way. In the past few weeks Leeds, Calderdale, and Stockport have kept improving what they do in a battle for the top spot.
I've also created an online validator that can analyse a Business Rates CSV file and suggest ways it can be improved. If it notices that the dates aren't in the right format, it provides a quick way to send the CSV file to the CSV Cleaner to fix them (using inter-window messaging). If it notices that latitude and longitude are missing (but Postcodes exist) it lets you quickly send it to the Postcodes2LatLon tool to add them.
Once you've fixed your file you can then add/update the URL in our Business Rates index file on Github and send a pull request. Like the MHCLG's index of Brownfield Sites, this acts as as sort of de-centralised register that any Local Authority (or citizen) can help keep up to date.
Cool URIs don't change
Most Local Authorities put each new release of their Business Rates at a new URL. That means that they will need to update the URL in the index each time they issue a new release. That is time consuming and, in practice, won't happen in many cases. A more efficient approach would be to use "Cool URIs" i.e. have fixed URLs for the latest versions of these sorts of datasets e.g. calderdale.gov.uk/data/business-rates/latest. The exact form of the URL will vary from Local Authority to Local Authority (to deal with the different ways all their websites work) but it would mean that end-users could reliably get the latest snapshot of the data.
Using the data
Now that Local Authorities are starting to get their Business Rates in a consistent format we can start to look at what interesting things can be done with the data. Our current visualisation shows the premises on a map, creates a dashboard and graphs from the data, and tries to cross-match against the FSA's Food Hygiene dataset. But, because of data consistency, we can combine data from multiple local authorities too.
This visualisation is clearly quite basic at the moment and can be improved with time. However, by making sure the cleaned up source data are available to everyone, someone else could easily take the same data and do their own analysis. If you do, let us know and we can link to it.
Right now there are only 10 Local Authorities with good enough datasets to use in our tool. There are only 46 Local Authorites that currently seem to publish CSV versions of their Business Rates data (although some of those are actually publishing XLS files with .csv as the file extension). It would be fantastic to get more Local Authorities publishing to the standard. If your Local Authority doesn't currently do this, perhaps you could encourage them. Make sure to let them know that we've made tools to help them.