Every bin in the world
Recently we've been exploring radically open ways for councils to store and publish their bin data. That includes where every public waste and recycling bin is along with metadata such as collection days, bin type, allowed types etc. Currently, many councils will have an internal database (or several) that they maintain. These can quickly be incomplete or out-of-date when different parts of the councils maintain their own subsets or when external bodies - e.g. transport authorities - move bins at bus stops without the council knowing. Keeping the data up-to-date is work. Then you have the extra work of publishing the data openly (if at all). Is there a better way? We think there is. Use Open Street Map as your database.
Using Open Street Map to store geographic data like this is brilliant because its free, it already exists, and it lets anyone make improvements and fix problems. There is no chance that tens of thousands of people could form a team and actively collaborate to get all of the worlds bins onto a single map. But if they all just agree to put the bins near them on Open Street Map then that is the end result. As of today there are 405,243 waste bins and 256,396 recycling bins recorded on Open Street Map. That's a lot of bins.
Mapping the bins
Different areas of the world have different coverage. For instance, our Data Apprentice Patrick recently added Leeds City Council's bin data to Open Street Map so it now contains 3,971 waste bins in the Leeds area. However, Bradford only has 32 waste bins recorded, Calderdale only has 9 bins recorded, and Harrogate only three!
The breakdowns for Leeds, Bradford, and Calderdale come from our existing project that extracts data layers from Open Street Map daily for West Yorkshire. To make that we use a PBF file created by GeoFabrik in Germany along with the GDAL command line tools. The Yorkshire extract comes in at 31 MB and is easily processed every day on our cloud hosting (also in Yorkshire!). Although we were creating a tool for Leeds City Council we wanted it to work far beyond the borders of Leeds. That doesn't just mean Bradford and Calderdale. Why couldn't it work for the entire world?
We first tried Overpass which is an excellent service that you can query to extract specific tags/properties in various bounding boxes. When showing a map we could make a request for that particular area to an Overpass service. But, as soon as people go exploring around the map, that quickly ends up with lots of requests that each take time to process and you start overwhelming the Overpass server. A tweak to that idea was to split the world into "tiles" and asking Overpass for those. That limited us to one request per "tile" but we still ran into rate limits quite quickly.
Another option would be to get vector tiles from services such as OpenMapTiles. But that only solves the problem of showing the bins on a map. It doesn't let us easily make useful summaries, dashboards, tables or other tools for local authorities. We needed access to the raw map data.
We went back to our West Yorkshire method and extended this to Great Britain. We set up a process to download a daily Great Britain extract from GeoFabrik (1.1 GB), extract the nodes as GeoJSON using
ogr2ogr, build a GeoJSON tile layer (at zoom level 12), and save all these files on Github along with a GB summary GeoJSON (5.7 MB). That method gives us smaller files and means the interactive map can request static files rather than endless database calls; much faster for the end user. The map would be up to a day behind but that's still a big improvement on the open data that currently exists.
Reduce, reuse, recycle
Could we show all the bins in the world? The biggest issue is size. There is a lot of data in Open Street Map; a compressed copy of the entire world's map data comes in at 52 GB (compressed). Extracting it and processing it requires another couple of hundred GBs. How do we deal with that?
We got a Raspberry Pi 4 and a 512 GB external SDD. The next step was to learn how the OSM Planet Files work. A weekly extract is provided at planet.osm.org and daily, hourly and minutely change files are provided. Downloads are speed limited to 2048 KB/s so you don't want to download the whole planet a lot. Instead you get, say, the daily differences and apply these to your planet file using
osmconvert. However, this is quicker said than done. It took 3-4 hours to combine a daily difference on the Raspberry Pi 4 and a further few hours to extract all the bins (amenity=waste_basket and amenity=recycling). That's quite a big chunk of the day gone. So we had to think of ways to reduce the processing.
osmfilter to create an initial filtered version of the planet file that only contains bins. This filtered planet file is currently around 20 MB in size (.o5m). We should only need to do this once. Then, daily, we:
- Download the daily change file (100-130 MB), uncompress it (1.6 GB), and use
osmfilterto extract only the bins. We save this as a cut-down change file. The cut-down files are around 100-200 kB in size (.osc).
osmconvertto combine the cut-down daily change with the filtered planet. As these are both much smaller this is now pretty quick to do. We update the timestamp of the new .o5m file (because it seems to get lost) and save a .osm.pbf file for the next step.
- Create a GeoJSON file using
- Read each feature of the GeoJSON and work out which map tile it is part of (this is simple in this case because we are just dealing with points; you'd need to take more care with lines/polygons).
- Save all the map tiles and update Github.
This daily process takes 8-10 minutes on a Raspberry Pi 4 and gives us static tiles for the whole world.
We've managed to create a daily extract of all the world's bins running on a Raspberry Pi 4. Given how fast the process is, we could even contemplate using the hourly differences in the future if we needed to. This same process could be used to extract other feature layers too.