Skipping ahead to week 2 of #DALMOOC and some data wrangling. In one of the assignments (assignment56) George Siemens challenged us to go forth and find some educational data…so I did.
I am not embedded within an institution at the moment (and even if I were, I’d imagine there would be ethical/legal issues in exposing institutional data like this, even with aggregation/anonimisation). So I had to go foraging for some publicly available sources.
There were some suggestions from George Siemens – covering the US and the world:
Of course, at that level the data really speaks more to the academic analytics level – looking at student demographics, institution or programme completion rates, fees, staff-student ratios etc. Not as exciting as getting to the nitty-gritty of learning but will have to do!
At that level the data is also heavily summarised and aggregated so the opportunities to drill down for your own analysis are limited.
Never for making it easy for myself, I thought I’d have a wee look at the data sources available locally, in the UK.
For a novice like me, it was a bit of a challenge…
Thankfully, I came across some “data journalism” blogs with handy advice:
Rather than seeing what’s there, I decided to look for a particular data set – National Student Survey. The survey started in 2005 or so and was aimed at documenting student satisfaction with their courses at their graduation, using a very simple questionnaire (PDF). It started in England but now extends to some Scottish institutions in my own backyard. Seemed like a great idea to give university applicants some simple info on quality of teaching. But it has been causing quite a controversy, especially when some more esteemed establishments within the Russell Group Universities scored close to the bottom of the rankings on the quality of student feedback. There was much grumbling about methodology of course (some of it justified) but also institutional action to address the shortcomings in assessment strategies (or students’ perceptions of feedback).
I thought it would be fun to see how the Russel Group is doing these days:)
I found it in two places: at HEFCE website in simple, poorly documented Excel spreadsheets and via HESA in a more aggregate format as a part of a wider and well documented XML data set underlying Unistats website (the latter includes data on student retention, salaries, careers, staff/support etc.).
I went for HEFCE dataset from 2013 (2013 NSS results by teaching institution for all institutions). It was in a familiar Excel format and much of it in “human” easily understandable language. The results were granular to subject area/degree within each institution. There were some clean-up/reformatting required but it was likely to be minimal (I think I will write about it in another post).
I thought it would be neat to use a map for some of the visualisations (who doesn’t love a map;). Geolocation data for the institutions was missing from NSS HEFCE data – but data.ac.uk came to the rescue here with their list of registered learning providers. They even threw in institutional groupings (e.g. Russell group etc.) for good measure (see augmented data set). Both sets included UKRLP code which should make for an easy join.
HEFCE set only contained question numbers so I needed to create another table containing question text as well as the evaluated course aspect – and I used the exemplar questionnaire from NSS website as an input here. I would use question number as a join with the NSS set.
Phew – it was rather hard work all this rummaging for data. And this is even before the clean up and playing around with it in Tableau!
What I learned:
much of the data (especially in the less aggregated and more valuable format) is available via subscription only to the members of educational institutions (e.g. HEIDI or Discover portal)
some of the databases have in-built interactive visualisation tools, e.g. Discover
each dataset has their own terms and conditions for use – you must read (or click through) a lot of bumph even before you get started, especially on portals aggregating datasets across sources!
data derived from the same data collection exercise can often be found via different sources varying in degree of aggregation, data integration and documentation – it looks like it is worth looking around for something that fits your needs
getting to know well documented and structured data can be hard work, especially for a database novice like me (labels etc. are rarely written in human-readable language and you have to digest a lot of definitions)
it is likely that you will have to find more than one data source to cover all the aspects you need for your analysis
Even highly curated data sets may need some clean up
#DALMOOC’s week 1 competency 1.2 gave me an excuse explore some definitions.
As a scientist, the insistence on using the term “analytics” as opposed to “analysis” I found intriguing…The trusty Wikipedia explained that analytics is “the discovery and communication of meaningful patterns in data” and has its roots in business. It is something wider (but not necessarily deeper) than data analysis/statistics as I am used to. Much of it is focused on visualisation to support decision-making based on large and dynamic datasets – I imagine producing visually appealing and convincing powerpoint slides for your executive meeting would be one potential application…
I was glad to discover that there are some voices out there which share my concern over the seductive powers of attractive and simple images (and metrics) – here is a discussion of LA validity on the EU-funded LACE project website; and who has not heard about the issues with Purdue’s Course Signals retention predictions? Yet makers of tools such as Tableau (used in week 2 of this course) emphasise how little expertise one needs to use them to look at the data via the “visual windows”… The old statistical adage still holds – “garbage in – garbage out” (even though some evangelists might claim that in the era of big data statistics itself might be dead;). That’s enough of the precautionary rant…;)
I liked McNeill and co.’s choice of Cooper’s definition of analytics in their 2014 learning analytics paper (much of it based on CETIS LA review series):
Analytics is the process of developing actionable insights through problem definition and the application of statistical models and analysis against existing and/or simulated future data (my emphasis)
It includes the crucial step in looking at any data in applied contexts – simply asking yourself what you want to find out and change as a result of looking at it (the “problem definition”). And the “actionable insights” – a rather offensive management speak to my ears – but nonetheless doing something about it seems rather an essential step in closing any learning loop.
The, currently, official definition of Learning Analytics came out of an open online course in Learning and Knowledge Analytics 2011 and was presented at the 1st LAK conference (Clow, 2013):
“LA is the measurement, collection, analysis and reporting of data about learners and their contexts, for purposes of understanding and optimising learning and the environments in which it occurs.”
This is the definition used in the course – the definition we are encouraged to examine and redefine as this very freshly minted field is dynamically changing its shape.
Instantly I liked how the definition is open on two fronts (although that openness seems to be largely in the realm of aspirations than IRL practice, but is not surprising, given the definition’s origins):
1.It does not restrict data collection to the digital traces left by the student within Learning Management Systems/Virtual Learning Environments so it remains open to data from entirely non-digital contexts. Although in reality, the field really grew out of and, from what I can see, largely remains in the realm of big data generated by ‘clicks’ (whether it be VLEs or games or intelligent tutoring systems). The whole point really is that it relies on data collected effortlessly (or economically) – compared to traditional sources of educational data, such as exams. What really sent a chill down my spine is the idea fleetingly mentioned by George Siemens in his intro lecture for this week – extending the reach outside of the virtual spaces via wearable tech…. So as I participate in the course I will be looking out for examples of marrying the data from different sources. I will also look out for dangers/side effects of focusing on what can be measured rather than what should be measured. I feel that the latter may be enhanced by limiting a range of perspectives involved in development and application of LA (to LA experts and institutional administrators). So keeping an eye out for collaborative work on defining metrics/useful data between LA and educational experts/practitioners, and maybe even LDs is also on the list (Found one neat example already via LACE SOLAR flare UK meet, which nicely coincided with week 1 – Patricia Charlton speaks for 3 min about the mc2 project starting at 4.40 min. The project helps educators to articulate many implicit measures used to judge student’s mathematical creativity. Wow – on so many levels!).
2. It is open to who is in control of data collection (or ownership) and use – institution vs the educator and the learner. I was going to say something clever here as well but it’s been so long since I started this post, it’s all gone out of my head (or just moved on). I found another quote from McNeill and co., what is relevant here:
Different institutional stakeholders may have very different motivations for employing analytics and, in some instances, the needs of one group of stakeholders, e.g. individual learners, may be in conflict with the needs of other stakeholders, e.g. managers and administrators.
It sort of touches on what I said under point 1 – need for collaboration within an institution when applying LA. But it also highlights the students as voices of importance in LA metric development and application. It is their data after all so they should be able to see it (and not just give permission for others to use it) and they are supposed to learn how to self-regulate their learning and all…Will I be able to see many IRL examples of such tools made available to students and individual lecturers (beyond the simple warning systems for failing students such as Course Signals)? There was a glimmer of hope for this from a couple of LACE SoLAR flare presentations. Geoffrey Bonin talked about Pericles project and how it is working to provide a recommendation system for open resources in students’ personal learning environments (at 1 min here). Or rather more radical, Marcus Elliot (Uni of Lincoln) working on a Weapons of Mass Education project to develop a student organiser app going beyond institution giving students access to the digested data and involving research project around student perceptions around learning analytics and what institutional and student-collected data they find most useful – data analytics with students not for students (at 25 min here).
(I found Doug Clow’s analysis of LA in UK HE re: institutional politics and power play in learning very helpful here and it was such a pleasant surprise to hear him speak in person at the LACE Solar flare event!)
The team’s perspective on the LA definition was presented in the weekly hangout (not surprisingly, everybody had their own flavour to add) – apologies for any transcription/interpretation errors:
Carolyn (the text miner of forums): Defined LA as different to other forms of Data Mining as focussing on the learning process and learner’s experiences. Highlighted the importance of correct representation of the data/type of data captured for the analysis to be meaningful in this context vs e.g. general social behaviours.
Dragan (social interaction/learning explorer): LA is something that helps us understand and optimise learning and is an extension (or perhaps replacement) of the existing things that are done in education and research, e.g. end of semester questionnaires no longer necessary as can see all ‘on the go’. Prediction of student success is one of the main focuses of LA but it is more subtle – aimed at personalising learning support for success of each individual.
Ryan (the hard-core data miner who came to the DM table waaay ahead of the first SOLAR meet in 2011, his seminal 2009 paper on EDM is here): LA is figuring out what we can learn about learners, learning and settings they are learning in to try to make it better. LA is about beyond providing automated responses to students but LA also focuses on including stakeholders (students, teachers and others) in communication of the findings to them.
So – a lot of insistence on focus on learners and learning…implying that there are in fact some other analytics foci in education. I just HAD TO have a little peak at the literature around the history of this field to better understand the context and hence the focus of the LA itself (master of procrastination reporting for duty!).
Since I have gone beyond the word count that any sensible co-learner may be expected to read, I will use a couple of images which do illustrate key points rather more concisely.
I appreciate the importance of such “territorial” subject definitions, especially in such an emerging field, with the potential of being kidnapped by educational economic efficiency agenda prevailing these days. However, having had an experience of running courses within HE institutions, I feel that student experience and learning can be equally impacted by BOTH, the institutional administration processes/policy decisions AND the quality of teaching,course design and content. So I believe that joined up thinking across analytics “solutions” at all the scales should really be the holy grail here (however unrealistic;). After all there is much overlap in how the same data can be used at the different scales already. For that reason I like the idea of unified concept of Educational Data Sciences, with 4 subfields, as proposed by Piety, Hickey and Bishop in Educational data sciences – framing emergent practices for analytics of learning organisations and systems (PDF). With one proviso – it is heavily US-focused, esp at >institution level. (NOTE that the authors consider Learning Analytics and Educational Data Mining to belong in a single bucket. My own grip on the distinction between the two is still quite shaky – perhaps discussion for another post)
I would not like to commit myself to a revised LA definition yet – I shall return to it at the end of the course (should I survive that long) to try to integrate all the tasty tidbits I collect on the way.
Hmm – what was the assignment for week 1 again? Ah – the LA software spreadsheet….oops better get onto adding some bits to that:)
Week 3 and 4 in Exploring PLN Seminar (and maybe even week 5…)
This post is aimed to be a quick wrap-around my #xplrpln artifact (you can find it on Prezi) we were invited create in response to a set scenario and share it with others this week.
Yes, it is covering two weeks of the seminar (and I am posting it in week 5!). This is not because I have become disengaged or too busy. It is because the few readings on PLNs and organisations provided by Jeff and Kimberley in week 3, along with the seminar participant contributions needed some solitary rumination before I could spit the chewed cud back into the communal fermentation vat (help! – I seem to be losing control over my metaphors…).
Among all the rumination around the topic I was also struggling with the idea of the CEO pitch. This is not the first time that my allergy to corporate/managerial context has surfaced. One of the reasons I quit the Open University’s #H817 Openness and Innovation in eLearning course earlier this year was the large chunk of the assessment based on presenting business cases. I understand why – it’s important to make such courses relevant to practitioners via authentic/applied assessment. Perhaps it is something about the executive language? Perhaps it is the difficulty of putting myslef in the CEOs or large organisations’ shoes (I keep thinking that the bottom line for them is really just financial gain – even in educational institutions these days)? Perhaps it is the disenchantment with such organisations and their cultures? Or simply the lack of sense of play and fun in learning from such artifact?
On the other hand, perhaps I started to expect an inspiration to push beyond institutional/established mindsets in my ‘learning experiences’. To be encouraged to explore different ways of representing and applying my understanding. This is one of the thing that cMOOCs taught me (although in fact it was probably seeded long time ago when I heard about learning in the open and digital artifact-based assessment approaches taken by the University of Edinburgh’s MSc in Online Education).
I did try to take on a challenge of getting it done, finding a “professional” voice. But simply couldn’t force myself to go there. Thankfully – this learning experience, unlike the caged OU course, was not prescriptive. I enjoyed crystallizing my ideas around the potential institutional horrors of ‘implementing’ PLN approaches at universities – large, complex and culturally diverse organisations. I did try to entertain the audience as well. Including the imaginary HE leaders and, the very tangible, fellow #xplrpln-ers alike. While making us laugh I was hoping to make us all more thoughtful before we rush into implementation of the new PLN and related ideas at a massive scale. At institutional and Profersonal(TM) levels.
My ruminatory state sharpened my attention to examples of organisational PLN horrors in my recent PLN data stream – I tried to include those alongside the insights from the course readings. These anecdotes turned it from a theoretical to a very much real-life tale…and also illustrated the ongoing/dynamic nature of the beast. Changes in technology, terms and conditions will keep coming, and we have to, personally and institutionally, keep reconsidering the cost/benefit equation for PLNs or specific technical solutions which may enhance/detract from it.
Why the horror angle? I thought a lot of the PLN-related hype is coming from businesses who have much to gain from organisations and individuals engaging with the services they offer – either as paid for SNA /social intranets/social enterprise solutions for organisations or by getting hold of our very much monetisable data, including our personal or professional network interactions via their ‘free’ social media services (we have all heard the now well worn warning “when you are not a customer you are a product”). Organisations implementing social learning solutions may also have less than altruistic ideas at heart. I thought an antidote to the seductive murmur of the Sirens was in order:) Oh – and it *was* Halloween…
Just in case I don’t find time for more reflection around #xplrpln here, I would like to say now that I am extremely grateful to Kimberley and Jeff for putting this seminar together and to the co-participant for diving in (or even just watching). It has been a great adventure!