#WaterData18 - part 1
Stating something on the internet, putting it out there in the open for all to see, is a great way to get things done. You are then accountable for your efforts. Those who are interested in your work can question you. And you can answer openly, even if that answer is 'we don't know.' Being out in the open shows a level of commitment to honesty and transparency, but to also admitting that maybe you don't do things as well as you could and that you're humble enough to say 'take a look at what we do and let us work together to improve it.'
This is everything that Yorkshire Water signed up for when they came onboard as a sponsor of ODI Leeds earlier this year. Their press release, and subsequent launch event in March (viewable on YouTube here), laid out their ambitions for all to see. Publish more open data, and aim for all operational data to be published openly by 2020. They have approached the collaboration with ODI Leeds with a willingness to listen to people with genuine feedback about their data (as you will see later, several suggestions for improvements and new datasets were made in the first 30mins of #WaterData18!), and with the self-reflection that their own teams can't meet all of their challenges ahead.
#WaterData18 is the first innovation event of many to explore Yorkshire Water open data. Held over two days - Friday 18 May and Saturday 19 May - more than 50 people gathered in total to work together, coming up with solutions to the challenges posed on this blog post or just exploring the data to find better ways to present and use it.
The morning introductions and overview to data were all filmed and live-streamed for the benefit of people who couldn't be there. If you want to skip reading the rest of the blog (you'll miss some witty repartee) then you can watch the archived video on YouTube. Staying with us? Then carry on!
Day 1 - Friday 18 May
Beginning with brief introductions from Paul Connell, founder of ODI Leeds, and Richard Emmott, Director of Communication at Yorkshire Water, they set the scene for the days ahead. Richard spoke very enthusiastically about the collaboration, excited to see what came out of the hackathon. A brief interlude followed for another cup of tea/coffee and some chatter, everyone assembling again for the data overview.
Rob Krempic, Manager of Asset Analytics for Yorkshire Water, gave a walkthrough of the Leakage DMA Data (available on Data Mill North) which was released earlier this year. A Distribution Management Area (DMA) can contain upto 1000 domestic properties, upto 50 commercial properties, and many other things in between, such as schools, hospitals, etc. The flow of water through a DMA is measured in litres per second and collected every 15 mins throughout the day. The way that Yorkshire Water currently use this data for leak detection is by analysing the 'nightline' data - the flow of water over night, usually midnight to 6am. Why? Well it's a fairly safe guess that most people are asleep during those hours and thus not using water. This can help establish a 'baseline' rate of flowing water, allowing for anomalies to be detected. But this is far from perfect. There are swathes of outside factors that can affect the flow of water - some of them human, some of them natural. Temperature, rainfall, extreme weather events (snow in April, looking at you!); school holidays, religious festivals, general demographics of people; housing type, topography, local industry. How would these influence the flow of water?
Jason Griffin, Leakage Technology Manager at Yorkshire Water, was next to talk about a dataset that went live literally the day before #WaterData18! Acoustic Logger Data is as straightforward as it sounds - the data collected by acoustic loggers. A new piece of kit for Yorkshire Water, they were only installed on some pipes, valves, and fittings back in February. They are currently only switched on for an hour each night to collect nightline data, taking 3000 audio samples in that time and then sending that data back to Yorkshire Water for analysis. The dataset contains the daily average sound level and average spread of noise for each acoustic logger in each DMA. What does sound have to do with water? For a long time, if a leak was suspected in an area, people were deployed with 'listening sticks' to try to locate the source of the leak. And yes, a listening stick is exactly what it sounds like. Over the years, the tech got more advanced but leaks were still difficult to pinpoint. The acoustic loggers are fitted in precise locations (obviously not included in the dataset as some loggers will be fitted to pipes that serve homes) so if they detect a problem, the location of the leak can be investigated quickly.
Time for a well earned coffee-break after all that!
Suitably refreshed and topped up on biscuits and pastries, everyone came back together for a brief session to form teams and themes. Earlier in the week, we had published a blog post with several suggested themes that Yorkshire Water wanted to see ideas for but we always allow room for people to bring their own interpretations and themes. Seven teams were founded in total, covering various aspects of acoustic logging data, the leakage data, combining with other datasets, making processes better, etc. Then it was time for lunch.
A quick check-in with the teams after a hearty lunch to find any gaps in skills, etc, and then it was on to prototyping. Yorkshire Water had a support team to help people with data questions, technical problems, etc, and they were certainly kept busy throughout the afternoon.
Just before the afternoon update, Yorkshire Water had managed to publish a new dataset! More accurately an extension of the acoustic logger dataset, they published example recordings from the acoustic loggers. The sounds of different water flows - stable, small leak, little leak, and more. Now this is why we love hackathons - ask and it could be delivered!
Summary of the teams and their status:
Digital listening stick - the team who asked Yorkshire Water for the water flow audio files. The seed of the idea was to use a combination of expert-guided machine learning to train algorithms to identify leaks from the sound files with a 'citizen-science' approach to gathering even more data. Let Yorkshire Water customers collect sounds of water flow via an app on their phone, with incentives for those who took part..
Neural network - developing a neural network to, at first, supplement the analysis and detection of leaks via the acoustic logger data.
Regionally speaking - looking at the growth of population in an area and comparing that to the flow data, trying to spot trends and patterns in regional differences.
Digging the right size hole - improvements to leak detection in addition to methods already employed by Yorkshire Water. Ideas included the use of mobile acoustic sensors for better triangulation, using some form of submarine drone, using special additives in water that can be detected at ground level, and finally using Doppler radar.
Personal loggers and customer-driven data collection - this was an idea that combined the use of an acoustic logger fitted specifically to the water supply of a house and then asking members of the household to keep a diary of their water consumption.
The sounds of leaks - using machine learning to develop a better decision threshold in regards to identifying leaks by sounds and whether it needs a repair.
DMA flow - machine learning employed to analyse the nightline data and help improve the determination of a background flow rate.
Aggregation - using an API to make aggregates of the DMA data into series.
The end of the day was marked with one last opportunity for tea and cake. Then it was time to pack up and rest up, ready for day two.