As part of my Alteryx training, following on from my starting out with Alteryx, I decided to try and use it for a real world example to test it out. I recently had a baby so have to start thinking about schools. Luckily the UK government make school performance data public and the excellent Guardian Datablog have tidied it up for me (the data is for 2012, hence now out of date, but good enough for my training in Alteryx). The data contains school address details, local authority details, school size, school religion and multiple measures of performance.
I would like to know which schools are performing well – i.e. their pupils have high attainment in their exams – and then see which streets are within X miles to help guide me which streets I would need to live to get my son into a chosen school. Alteryx has an in built function called Trade Area where it can map show you a chose radius from a point, which is ideal for the schools approximated catchment area analysis.
Each purple dot is a school with the cirle around the dot being the approximate catchment area, although looking at this it appears the catchment area in London will, in reality, be less than 0.3 miles.
I began by downloading a couple of those cleaned up files from the Guardian data site, one containing the school performance data, another containing the relevant local authority details. I have a database background so could easily import all of the data into a database and join it up. Alternatively I could do it in Excel using some sort of lookup (vlookup, index & match, etc) although the volume of data may not make that practical. But the purpose of the exercise is to test out Alteryx so I didn’t want to prepare the data in advance, I wanted to try it all in Alteryx.
I began by importing the school data into Alteryx. This was from a csv so I used the Input tool, available in Favourites and In/Out on the toolbar, to import it. This was simple, the Input tool automatically configured the file for me. Next I imported the local authority data to give me the local authority name, this is to be used as a filter on the Tableau dashboard as I know which areas I prefer to live.
To cut a long story a little shorter, I joined the data in Alteryx, had a look at what I’d done using the Browse tool, available in Favourites and In/Out on the toolbar, and then realised I need to longitude and latitude of each school to use the Trade Area tool for getting the chosen radius from the school. In addition it would help to map the data in Tableau. Fortunately someone else has mapped every UK postcode to a longitude and latitude and kindly put that data in a csv on the internet for anyone to use. This meant I had to completely alter my Alteryx model to deal with this new data, adding extra joins, carrying out additional selects, all time consuming but good for my Alteryx training.
This is where my database background helps out as I understand how to join data. I expect Alteryx sell their product as usable by dummies but in reality I think a reasonable relational database understanding will help with modelling the data and knowledge of R will help with the predictive analytics side of Alteryx (which I hope to come to later in a future post).
The schools data came with UK postcode formatted as XXXX XXX, while the postcode to longitude and latitude mapping file has the postcode formatted as XXXXXXX. Therefore the datasets won’t join. Removing the space from the schools data postcode is the simple fix for this problem and Alteryx have a function to do that, the Formula tool, found in Favourties and Preparation on the toolbar. Using the Formula tool I have many different expressions I can use to alter the underlying data. In this case I wanted to replace the empty spaces with a zero length string, which is done by using the ReplaceChar expression: ReplaceChar([School postcode], ” “, “”).
The next step is to use the Select tool, also found in Favourties and Preparation on the toolbar, to only use the data from the schools data which is useful to me. That file contains a lot of data and I prefer to narrow it down early in this case.
Now the schools data is in a usable state I’m able to join the schools data to the UK postcode longitude/latitude data to bring the school longitude and latitude into the school performance dataset. To do this I use the Alteryx Join tool, found in the Join section on the toolbar, joining on postcode (from the postcode data) and School postcode (the formatted postcode from the schools data).
Bringing in the Local Authority data is the next step. Once again a Join tool is used, this time joining the LA number in the Local Authority data to the LA number in the schools data. I then use another Select tool to pull the LA Name into the main dataset, along with the selected schools and longitude/latitude data pulled in previously.
All that remains is to get Alteryx to create the trade area to display the radius around the school. First step is to tell Alteryx which points to use to enable the trade area. In the Spatial section of the toolbar the function Create Points is used – this is what creates the longitude and latitude points that the Trade Area function can translate. Ensure the school longitude and latitude fields are used in the Create Points X and Y Field area.
I now recommend using the Browse function and seeing the work using the Alteryx map view. Set the map to view using Maps Powered By CloudMade to view the data as in the image at the top of this post.
The final project in Alteryx is as follows, with the Tableau Data Extract created.
Thank Alteryx for making that more straightfoward than any other method, open up Tableau and use the newly created Tableau Data Extract to visualise the data. The Tableau dashboard I quickly put together using the Alteryx output Tableau Data Extract is here.