Can Data Fix Poverty?
Working with JRF on Poverty and Data

Credit: Joseph Rowntree Foundation
Poverty is getting worse. Just read the Joseph Rowntree Foundation’s (JRF) poverty report for 2023. In 2020/21 around one in five of our population (20%) were in poverty. It is shocking that a country as rich as the UK has such a high poverty rate. The JRF is working to change this, and we have joined forces with them to help understand poverty in the North.
Poverty is an outcome of all the social and economic inequality in the country. Often this is linked to financial data because it is easier to capture quantitatively. But this only tells part of the story. People experience social, emotional, and physical hardships. If we want to tell the full story using data then we are going to have to do something new, and therefore be open to the possibility of failure, although, of course, it won’t be our aim.
Through open, democratised access to data, our intention is to experiment with creating as local as possible information that tells a story about poverty in the North of England. We want it to be accessible by non-technical people, but not limit those with technical backgrounds from diving deeper if they want to.
The Goal
Our Approach
I'm going to talk a lot about data pipelines and data visualisations so I'll start by explaining what they are. A pipeline is just a bit of code that takes data from a source, transforms or analyses it, and puts it into a store (like a gas pipeline). A visualisation is a representation of some of that data such as a graph or chart.
We are actively avoiding a website full of “100_page_report.pdf” download links. These types of documents are hard to update, and there's no guarantee you have the same version as everyone else. If a chart is split across 2 pages, you can't grab that chart and use it somewhere else without somehow sticking it back together. If you work on different devices, you have to re-download the document.
Instead, our goal is to build a web page that both collects and visualises the most up-to-date, relevant data, automatically. We need to pull this data to a central repository (storage place) and build reproducible analytical pipelines (RAPs) to drive the visualisations on the site. We want to automate this process for a few reasons:
- To get data updates as soon as they are released.
- To save people from chores such as re-making graphs manually each time there is a data update, which frees up time to answer questions and inform policy-making.
You'd be within your rights to ask what a RAP is. Giles wrote a great blog on RAPs, and summarised their advantages quite nicely:
By building a RAP, we are capturing the organisational knowledge in code. It is immediately possible to see how a dataset / report is created. If this code is built into a re-runnable pipeline, then a job that takes a number of hours could be significantly shortened, at least in terms of human attention. Take the next step to running this automatically, and suddenly people are freed up to focus on more important things like answering questions posed by data.
Working as a data scientist to support this project, the first step for me was understanding what questions JRF are trying to answer, what data they currently use or will need to answer them, and where that data is stored. The rest of this blog post will introduce how we automatically harvest and visualise data, in descriptive terms, and part II will be a technical blog explaining the details.
Stat-Xplore
Looking at their UK poverty statistics page it was clear that the majority of their data comes from the Department for Work and Pensions (DWP). There are other sources too, and there will be some that we won't know about until we use them, but this seemed like a good place to start! After a few conversations with colleagues and a bit of digging around, it seemed the way to access DWP data was through a service called Stat-Xplore.
Stat-Xplore is extremely flexible, allowing users to create their own datasets with almost any breakdown they could think of. The problem is that despite claiming to be openly available, the data is locked behind a login portal which requires the user to create an account, and it has the kind of clunky UI that induces brain fog. I have found myself getting frustrated trying to navigate the custom table builder and know that others have similar frustrations. This had the potential to be a big blocker when trying to get the data that we needed in a fully automated way.

Credit: https://stat-xplore.dwp.gov.uk/webapi/jsf/login.xhtml
Fortunately, Stat-Xplore has an open data API, and someone else has created a python library to access it. This nicely wraps up all the complicated steps in a neat package, demonstrating the power of working in the open - it would have taken me a lot of time to write something like this myself.
Extract and Visualise
After setting up an account with Stat-Xplore, you can get something called an API key which allows you to access the data in Stat-Xplore remotely. It’s a key that opens a door, and behind that door is data. We can use this key in a piece of code that specifies where to get the data from and what to pull out. We can then store this data (depending on the type of license) locally, or in a GitHub repo, and complete further transformation or analysis to get that data into the format we want.
The final step is visualisation. We will use our own charting library, OI Lume Viz, to build charts, graphs, maps and more, and put them on a new website. The goal is to have a continually updating, North of England-specific data hub that gives a clear and informative insight into poverty statistics. Openly sharing the data we use and our processing steps makes it easy for others to peer-review our work, and dive deeper into specific questions when they need to.
We’re looking forward to implementing everything we’ve learned from similar projects, and hopefully, creating a useful tool that helps JRF get closer to its goal of ending UK poverty. Between now and September, we’ll be writing blog posts to document our work on this project, collated on the project page, and all of our code is available on our GitHub repository. In part II of this blog, I’ll explain the technical process behind using the Stat-Xplore API and RAPs.
If you have any thoughts or comments and would like to get in touch, please email me at luke.strange@open-innovations.org.