Data Mill North 2.0
The Open Data Collaboration Group came together in the early days of ODI Leeds, led by folk from the sponsor network. Initially made up of the local authority organisations, the group has grown over time to include other sponsors and some non-sponsor organisations as well. The purpose of this collaborative group? Share knowledge and skills, improve open data for the benefit of everyone, and work together to develop best practice. A prime example of this is the Business Rates project, where a data standard was created that meant every local authority that used the standard could be easily included in the visualisation tool. Good things happen when people work together.
The group meets every 6-8 weeks and sometimes has a topic of discussion (or will be open for suggestions). The last meeting before lockdown was in late-Feb, where plans for changing/improving Data Mill North were first mentioned. At our most recent meeting (6 May), we set the topic to 'Data Mill North 2.0' and took the meeting online to start getting people's ideas and feedback.
Stephen Blackburn and Hannah Roden of Leeds City Council set the context and described the challenges ahead. As a local authority, Leeds was collecting more data than ever and would continue to grow. They also had ambitious plans to provide more real-time data, and more engaging ways to reach citizens. This would have to be supported by solid foundations and data infrastructure.
What does Data Mill North do well?
Data Mill North has been around for nearly 6 years. Initially called 'Leeds Data Mill' it always had the ethos of being broader than just the city council. When it re-branded as Data Mill North in 2016 it expanded its scope to include other local authorities and organisations across the North of England. So they are not starting from a blank slate - Data Mill North in its current state has a lot to offer the data community. It was praised for its broad range of available data (which included local authorities to private-sector organisations) and the availability of raw data and good data practice (such as fixed URLs). The general consensus was that, as a platform, it was easy to use, to search, and to get data. It feels like a place to convene and develop a strong data community, which is one of many strengths that should be preserved. In terms of demonstrating use cases for publishing open data, attendees cited the reduced costs of FOI requests for Local Authorities (because the data is there) and the fact they could see what others (similar Local Authorities, organisatons, etc) were publishing.
After establishing what was good now, the conversation turned toward what Data Mill North could do better or what other things it could offer. Improved communication about geography and the ability to visualise datasets *within* the platform were popular suggestions. We are no strangers to the challenges of different geographies (and what happens when trying to make them work together) so the ability to see quickly which geographies are used in a dataset (by having good metadata) would be useful. Another good suggestion was the creation of schemas, so that related or similar data across Local Authorities (like business rates for example) can be published to a standard format. This could also apply to any organisation using Data Mill North. Northern Powergrid are a good example of this as they try to create and encourage the adoption of a 'Distribution Future Energy Scenarios' data standard.
What we would like to see in Data Mill North 2.0
- Schemas - develop standard formats for some common datasets to ensure consistency and compatibility.
- Validation - add some publisher tools to help them validate CSV files that they add. This might help them get dates into an international standard, add latitudes and longitudes from postcode fields, or other things in our data tips. It could suggest improved, standardised, column headings.
- Geographies - include ONS geography codes in metadata and/or have the platform be able to recognise them in data files so that the catalogue can be searched by ONS code.
- API - better support for software to upload/update files and metadata. This is vital so that organisations can build workflows in tools such as FME to automatically publish data - reducing human errors and effort. This should also make bulk uploads easier to do.
- Visualisations - some basic visualisations for each dataset. These might be simple graphs or maps.
- Eat your own dog food - any visualisations in the platform should use the platforms public APIs to extract data giving internal and external visualisations the same access. That gives the platform an incentive to keep the API working well and allows good visualisations to be made by external partners using different tools and dataset-specific knowledge.
- External resources for datasets - a way for people to register their external visualisations/other that use a particular dataset. This:
- lets the publisher see who is using their data;
- lets the public find related resources;
- lets the publisher potentially contact end-users of the data with updates or for feedback.