We are onto Social Network Analysis in week 3 and now actually doing it rather than talking about it (I am running behind, of course, as this is very much a start of week 4 now. Yikes!)
So, we were given two datasets collected during the Connectivism and Connected Knowledge MOOC in 2011, encompassing exchanges between participants on Twitter and via blogs (communication via comments). Each data set had a version collected in week 6 and week 12 of the course. The data was pre-processed into the format directly importable into Gephi (a lovely open-source and free SNA visualisation tool).
For Twitter: “graphs included all authors and mentions as nodes of the network, and the edges between them were created if an author or an account were tagged within the tweet. For example, if a course participant @Learner1 mentioned @Learner2 and @Learner3 in a tweet, then the course Twitter network would contain @Learner1, @Learner2, and @Learner3 with the following edges: @Learner1 – @Learner2, and @Learner1 – @Learner3.”
For blogs: “[graphs] included authors of the blog posts (i.e. blog owners) and the authors of the comments to individual blog posts. If a learner A1 created a blog post, and then learners B1 and C1 added comments to that post, then the corresponding network would contain nodes A1, B1, and C1 with the following edges: A1-B1, and A1-C1. All the four networks are undirected.”
The pre-processed data is only available to the course participants and its use restricted to completing our assignments so I cannot share it here.
In this analysis step I imported each set into Gephi without any glitches and performed basic analyses and filtering for each of the sets at week 12 of the course. As per instructions this included computing the density measure and centrality measures (betweenness and degree) introduced in the course, followed by apply the Giant Component filter to filter out all the disconnected nodes and identify communities by using the modularity algorithm. Dagan’s walkthroughs in Gephi were very useful here (introduction + modularity analysis).
I report the key numbers in the table above.
It was nice to see some numbers:) And it was immediately obvious that there were some differences between Twitter and blogs, e.g. more nodes and edges in the former and twice as many “communities”.
But instantly I was concerned – what do these numbers actually mean?
Is network density of 0.003 good or bad? What does it actually mean in terms of e.g. speed of information flow in minutes/days? Has this stuff been quantified? Or is it just a matter of getting a feel for it with experience?
Or perhaps it only works for comparisons? If so – how do I tell if the difference in network density between Twitter and blogs (0.003 vs 0.01, respectively) is actually significant? And would this significant difference in a network measure value have any meaningful effect on the ease information flow within each network?
Clearly, still a lot to learn. Onto making some pretty pictures (oh – sorry – visualisations;) with the said data for part 2 of this task. Won’t be long I hope:)