This post is aimed at journalists and potential users of OpenRefine who have limited, or no experience at all, using it.
It is a frequent lament that governments and organisations tend to release information with the intention of it just being read and consumed without much thought of making the underlying data accessible for analysis. In order to actually use this data then, one first needs to ensure that it is in a useable format. In these cases, we often turn to OpenRefine to pull things into the format we need.
This post is aimed at journalists and potential users of OpenRefine who have limited, or no experience at all, using it.
We are currently working on a data-driven feature about poverty and women-headed households, using a dataset from the South African government’s War on Poverty programme between the years 2008 to 2014. What we wanted to do was to compare the number of cases identified in the 2011 census and the number identified in the War on Poverty project, however we first needed to wrestle the data into shape.
This is a step-by-step description of the process we followed. Hopefully sharing this with you will make your life easier in the future.
Using the tool Wazimap, we came across the data we wanted: households divided by head-gender, district and by income. As with life, things are never that straightforward and when we downloaded it, the format was not exactly what we needed. Time to put some trusty OpenRefine skills to work.
This is what we received. Note that every branch of income has three different columns; overall total, total male and total female samples.
In less than 10 minutes, we had our final format.
This post is aimed at journalists and potential users of OpenRefine who have limited, or no experience at all, using it.
It is a frequent lament that governments and organisations tend to release information with the intention of it just being read and consumed without much thought of making the underlying data accessible for analysis. In order to actually use this data then, one first needs to ensure that it is in a useable format. In these cases, we often turn to OpenRefine to pull things into the format we need.
This post is aimed at journalists and potential users of OpenRefine who have limited, or no experience at all, using it.
We are currently working on a data-driven feature about poverty and women-headed households, using a dataset from the South African government’s War on Poverty programme between the years 2008 to 2014. What we wanted to do was to compare the number of cases identified in the 2011 census and the number identified in the War on Poverty project, however we first needed to wrestle the data into shape.
This is a step-by-step description of the process we followed. Hopefully sharing this with you will make your life easier in the future.
Using the tool Wazimap, we came across the data we wanted: households divided by head-gender, district and by income. As with life, things are never that straightforward and when we downloaded it, the format was not exactly what we needed. Time to put some trusty OpenRefine skills to work.
This is what we received. Note that every branch of income has three different columns; overall total, total male and total female samples.
In less than 10 minutes, we had our final format.