Skipping ahead to week 2 of #DALMOOC and some data wrangling. In one of the assignments (assignment56) George Siemens challenged us to go forth and find some educational data…so I did.
I am not embedded within an institution at the moment (and even if I were, I’d imagine there would be ethical/legal issues in exposing institutional data like this, even with aggregation/anonimisation). So I had to go foraging for some publicly available sources.
There were some suggestions from George Siemens – covering the US and the world:
- Integrated Postsecondary Education Data System (IPEDS)
- US department of education
- Organisation for Economic Co-operation and Development (OECD) – which has a very nice Data portal (beta) with interactive graphs and data download options
- The World Bank
Of course, at that level the data really speaks more to the academic analytics level – looking at student demographics, institution or programme completion rates, fees, staff-student ratios etc. Not as exciting as getting to the nitty-gritty of learning but will have to do!
At that level the data is also heavily summarised and aggregated so the opportunities to drill down for your own analysis are limited.
Never for making it easy for myself, I thought I’d have a wee look at the data sources available locally, in the UK.
For a novice like me, it was a bit of a challenge…
Thankfully, I came across some “data journalism” blogs with handy advice:
- Help me investigate…Education (based on Help Me Invesitgate.com) has a nice summary for the main, national Higher Education data sources, including:
- Universities and Colleges Admissions Service (UCAS) – Data on university applications and acceptances
- Higher Education Statistics Agency (HESA) – Data on performance and destinations. Data is available via HEIDI database.
- Regional Funding Councils – e.g. HE Funding council for England (HEFCE)
- FullFact had a summary of sources with broader education focus, but its sources were focused on published analyses and reports rather than raw data.
HEFCE set only contained question numbers so I needed to create another table containing question text as well as the evaluated course aspect – and I used the exemplar questionnaire from NSS website as an input here. I would use question number as a join with the NSS set.
Phew – it was rather hard work all this rummaging for data. And this is even before the clean up and playing around with it in Tableau!
- much of the data (especially in the less aggregated and more valuable format) is available via subscription only to the members of educational institutions (e.g. HEIDI or Discover portal)
- some of the databases have in-built interactive visualisation tools, e.g. Discover
- each dataset has their own terms and conditions for use – you must read (or click through) a lot of bumph even before you get started, especially on portals aggregating datasets across sources!
- data derived from the same data collection exercise can often be found via different sources varying in degree of aggregation, data integration and documentation – it looks like it is worth looking around for something that fits your needs
- getting to know well documented and structured data can be hard work, especially for a database novice like me (labels etc. are rarely written in human-readable language and you have to digest a lot of definitions)
- it is likely that you will have to find more than one data source to cover all the aspects you need for your analysis
- Even highly curated data sets may need some clean up
Image source: Flickr by ttcopley under CC license