Using open data to calculate bus route distances

In August 2018 we completed a Discovery project with the Department for Transports (DfT) Bus Service Operators Grant (BSOG) team. The overall brief can be found here but in summary, the DfT asked us to assess the hypothesis that the existing BSOG process can be improved with a better use of technology


BSOG is complicated, but in summary the DfT makes grant payments and financial incentives to bus operators on an annual basis, based on their mileage covered, fuel consumed, as well as their achievement in certain incentive areas.

We delivered the project collaboratively with Deloitte, and within the wider project brief, we were testing specifically whether open data can be used to improve the existing way that the DfT validates and pays BSOG claims to operators.

First, we had to understand how the current process works.As you can see here, it's not straightforward, and there's lots of different steps and hand-offs to different organisations involved. However, what was clear in the analysis was how manual and paper-based the entire process is today. This increases the risk of errors, or even worse, fraud.

We ran an open process to engage extensively with the individuals and groups involved in the BSOG process. This includes bus operators, community transport operators (CTAs), local and combined authorities, claim auditors, suppliers, technology providers, Driver & Vehicle Standards Agency (DVSA), DfT and many others.

This involved holding a series of one-to-one interviews, two half-day open workshops, and an online survey.

This helped us to build a picture of the key areas for opportunities; the most problematic areas within the current process that looked like the best chances to be improved using open data.

BSOG submissions

In their annual submissions, operators must submit to DfT information about their mileage of their operated services, and the fuel consumption of their fleet


Bus operators are required to calculate their estimated total mileage for the year ahead (for all planned services), then calculate an accurate mileage at the end of the year (removing dead miles, cancelled services, diversions etc). Then, once received, the DfT check that submitted data.

We identified in this project that calculating this is a particularly labour intensive process both for operators and DfT. There are also risks that increase the likelihood of incorrect or inaccurate calculations and submissions. It also consumes significant resource from operators and DfT.

We knew that open data could help

So we built a tool. The distance calculator uses open data to calculate the distance of individual planned services.

Click here to have a play around with the tool.

What we did

    The route and timetable information used by the tool were obtained from Traveline open data. The data is in an XML based standard for UK transportation data known as TransXChange. A .NET Core 2script was used to convert the files from XML format to JSON format.

    National Public Transport AccessNodes (NaPTAN) is the national system for uniquely identifying points of access to public transport. Each bus stop is assigned a unique NaPTAN identifier code. NaPTAN data for East Sussex was obtained and then clipped using the GIS software package QGIS to leave only stops located in the Hastings local authority area.

    We cross-referenced stop lists contained in the TransXChange files for South East England with the list of Hastings bus stops from the NaPTAN data. If a route did not contain any stops in Hastings it was removed from the tool.

    To visualise the bus routes we used the Leaflet.js mapping library. Location information for stops was obtained from the NaPTAN data and each route was drawn as a polyline connecting adjacent stops. The distance was calculated using the geospatial analysis library Turf.js as the cumulative straight line length between each line segment.

    For more accurate estimates of route distance we used Leaflet routing machinetogether with the Mapbox directions API.This method calculates a driving route using the stops as waypoints for the journey. This helps calculate a much more accurate total distance for each service. In some instances the difference in distance between the two calculation methods was up to 15.

    Our tool is currently limited to services in Hastings and comprises of 41 routes run by four operators. However,as we have TransXChange schedule information and NaPTAN stop location data for the whole of England, Scotland and Wales it would be possible to expand the tool to a national level relatively easily.

    By using existing data from the Traveline National Dataset (TNDS) it will:

    • Use data that operators have already submitted;
    • Reducing the operational burden on operators to re-submit data;
    • Improving the transparency of the BSOG process;
    • Reducing the time taken by DfT in their validation;
    • Improving the accuracy of BSOG submissions;
    • Reinforcing the importance and expanding the usage of existing bus open datasets.

    What happens next?

    Using the tool we have proved the concept that open data canbe used to improve the existing BSOG process.

    By using existing open datasets we are improving the accuracy and transparency of the end to end BSOG process, reducing the amount of resource time required to create, submit and validate claims.

    The DfT are considering their next steps for their project.This is likely to involve moving into an Alpha project, where prototype tools will be developed for the full technology solution including all BSOG functions such as the workflow management, payments and registrations. It may also involve some kind of reform of the BSOG policy.

    We think there is a huge opportunity to use open data as part of this technology project to improve BSOG for all stakeholders.