Tag Archives: HESA

Getting to grips with educational data sources in the UK – and the NSS mini-case study

Where do I even start?
Where do I even start?

Skipping ahead to week 2 of #DALMOOC and some data wrangling. In one of the assignments (assignment56) George Siemens challenged us to go forth and find some educational data…so I did.

I am not embedded within an institution at the moment (and even if I were, I’d imagine there would be ethical/legal issues in exposing institutional data like this, even with aggregation/anonimisation). So I had to go foraging for some publicly available sources.

There were some suggestions from George Siemens – covering the US and the world:

Of course, at that level the data really speaks more to the academic analytics level – looking at student demographics, institution or programme completion rates, fees, staff-student ratios etc. Not as exciting as getting to the nitty-gritty of learning but will have to do!

At that level the data is also heavily summarised and aggregated so the opportunities to drill down for your own analysis are limited.

Never for making it easy for myself, I thought I’d have a wee look at the data sources available locally, in the UK.

For a novice like me, it was a bit of a challenge…

Thankfully, I came across some “data journalism” blogs with handy advice:

UK data service provides centralised Discover portal to number of education-related data sets, including the international ones (e.g. OECD or EU-based).
There is also the brand new and so far limited but rather ambitious data.ac.uk aimed at aggregating openly available HE education data (thanks to helpmeinvestigateeducation and Tony Hirst for pointing me to this source).
Rather than seeing what’s there, I decided to look for a particular data set – National Student Survey. The survey started in 2005 or so and was aimed at documenting student satisfaction with their courses at their graduation, using a very simple questionnaire (PDF). It started in England but now extends to some Scottish institutions in my own backyard. Seemed like a great idea to give university applicants some simple info on quality of teaching. But it has been causing quite a controversy, especially when some more esteemed establishments within the Russell Group Universities scored close to the bottom of the rankings on the quality of student feedback. There was much grumbling about methodology of course (some of it justified) but also institutional action to address the shortcomings in assessment strategies (or students’ perceptions of feedback).
I thought it would be fun to see how the Russel Group is doing these days:)
I found it in two places: at HEFCE website in simple, poorly documented Excel spreadsheets and via HESA in a more aggregate format as a part of a wider and well documented XML data set underlying Unistats website (the latter includes data on student retention, salaries, careers, staff/support etc.).
I went for HEFCE dataset from 2013 (2013 NSS results by teaching institution for all institutions). It was in a familiar Excel format and much of it in “human” easily understandable language. The results were granular to subject area/degree within each institution. There were some clean-up/reformatting required but it was likely to be minimal (I think I will write about it in another post).
I thought it would be neat to use a map for some of the visualisations (who doesn’t love a map;). Geolocation data for the institutions was missing from NSS HEFCE data – but data.ac.uk came to the rescue here with their list of registered learning providers. They even threw in institutional groupings (e.g. Russell group etc.) for good measure (see augmented data set). Both sets included UKRLP code which should make for an easy join.

HEFCE set only contained question numbers so I needed to create another table containing question text as well as the evaluated course aspect – and I used the exemplar questionnaire from NSS website as an input here. I would use question number as a join with the NSS set.

Phew – it was rather hard work all this rummaging for data. And this is even before the clean up and playing around with it in Tableau!

What I learned:
  • much of the data (especially in the less aggregated and more valuable format) is available via subscription only to the members of educational institutions (e.g. HEIDI or Discover portal)
  • some of the databases have in-built interactive visualisation tools, e.g. Discover
  • each dataset has their own terms and conditions for use – you must read (or click through) a lot of bumph even before you get started, especially on portals aggregating datasets across sources!
  • data derived from the same data collection exercise can often be found via different sources varying in degree of aggregation, data integration and documentation – it looks like it is worth looking around for something that fits your needs
  • getting to know well documented and structured data can be hard work, especially for a database novice like me (labels etc. are rarely written in human-readable language and you have to digest a lot of definitions)
  • it is likely that you will have to find more than one data source to cover all the aspects you need for your analysis
  • Even highly curated data sets may need some clean up

Image source: Flickr by ttcopley under CC license