#dalmooc wk3 homework: Twitter and blog networks in CCK11 – SNA in Gephi part 2

Feels a bit like learning finger paining...
Feels a bit like learning finger paining…

Hey – and I am back from fiddling with pretty pictures (had a bit of a pause to consider meanings of “social capital” on the way – there will be a post about that, don’t you worry!)

So Part 2 of this activity/homework was to get some visualizations going on the Twitter and blog data sets from the CCK11 (Part 1 – prelim analysis here).

Exploring layouts

I used the larger Twitter set to have a play around with the layouts.

Yinfan Hu (YH) algorithm seemed to produce a vis which pulls external nodes out/away from the centre of the vis so that they form a jagged circle. Fruchterman Reingold (FR) laid it out so that a smooth circle was created with less differentiation between sub-clusters.

YH seems to have more properties to tweak but I had no idea what some of them meant, e.g. Quadtree max level or theta (despite a handy definition tip appearing at the bottom when you click on any of them). In fact, changing values of most did not seem to have any substantial effect on the overall shape of the visualisation – at first glance anyway. Two which had most effect were relative strength and optimal distance.

  • Relative strength – The relative strength between electrical force (repulsion) and spring force (attraction). Smaller values produce tighter central cluster. If you want to see inside the central cluster – make it larger!
  • Optimal distance – the natural length of the springs. Bigger values mean nodes will be further apart. Again – if you want to see individual nodes – increase the value. To tighten individual clusters/make communities more visible make it smaller.

FR had fewer properties so was easier to understand at a first glance (area, gravity and speed). In FR lower gravity (force attracting nodes to the centre) values made the cluster less tight, making it easier to distinguish individual nodes. It took much longer to run and needed to be manually stopped.

Without use of colour to highlight modularity I found it difficult to see any structure within FR layout, so I opted for using YH method for the analysis. It seems that OpenOrd (modification of FR) would be best for detection of distinct clusters – but this must be an imported add on as I do not see it in the default layout list. Something to explore at another time:)

I also found Gephi Tutorials on Slideshare re:layout helpful:

The choice of methods depends on topology you want to emphasise (also size of your network though)

Explanation of FR method

  • Area = graph size area
  • Gravity = increasing gravity reduces dispersion of disconnected components by pulling them into the centre

IMPORTANT: When the algorithm does not converge – need to reduce speed to gain precision (unstable nodes position/unstable graph)

Explanation of YH method:
Including demystification of some of the more obscure parameters:

Sizing the network nodes based on centrality measures

Playing around with sizing the nodes based on their centrality measures gave a quick overview of the visual overlap between the nodes ranking highest on the different measures. For example, in Twitter network, nodes with highest values of betweeness centrality also looked like the ones with highest values of degree centrality (see below).

Size=degree centrality
Size=degree centrality
Size=betweeness centrality
Size=betweeness centrality

Ultimately I sized the nodes based on betweeness centrality and inserted the degree centrality score as a label (purely a matter of convenience as degree centrality values were just too large to fit into the node circles!).

Visualising communities identified in the networks

The tutorial suggested by Dagan – by Jen Goldbeck, was useful here although I have still not worked out how to highlight the selected nodes in the Data lab spreadsheet from the right click.

I fiddled with size range for the nodes by increasing the minimum size so that the circles are large enough to show the colours.

I also played around with modularity factor, increasing it to 2 for Twitter and to 1.8 for blogs in order to decrease the number of communities for clarity (from 12 to 8 and from 6 to 4 respectively).

I also changed some of the colours to provide better contrast between the different communities.

NOTE: any overlap in colours for Twitter and blog visualisation is accidental and does not indicate overlap in communities across these two environments.

And voilla – two pretty pictures!

Blog network in CCK11 course. Node size=betweeness; node label=centrality.
Twitter network in CCK11 course. Node size=betweeness, node label=degree.
Twitter network in CCK11 course. Node size=betweeness, node label=degree.

What does it all mean? I think I may have to cover that in the next post – the actual Assignment for week 3 demonstrating my competencies…It will be nice to finally have some questions to answer:)

Top image source: Flickr by Maegan under CC license


One thought on “#dalmooc wk3 homework: Twitter and blog networks in CCK11 – SNA in Gephi part 2”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s