Hey – and I am back from fiddling with pretty pictures (had a bit of a pause to consider meanings of “social capital” on the way – there will be a post about that, don’t you worry!)
So Part 2 of this activity/homework was to get some visualizations going on the Twitter and blog data sets from the CCK11 (Part 1 – prelim analysis here).
I used the larger Twitter set to have a play around with the layouts.
YH seems to have more properties to tweak but I had no idea what some of them meant, e.g. Quadtree max level or theta (despite a handy definition tip appearing at the bottom when you click on any of them). In fact, changing values of most did not seem to have any substantial effect on the overall shape of the visualisation – at first glance anyway. Two which had most effect were relative strength and optimal distance.
- Relative strength – The relative strength between electrical force (repulsion) and spring force (attraction). Smaller values produce tighter central cluster. If you want to see inside the central cluster – make it larger!
- Optimal distance – the natural length of the springs. Bigger values mean nodes will be further apart. Again – if you want to see individual nodes – increase the value. To tighten individual clusters/make communities more visible make it smaller.
FR had fewer properties so was easier to understand at a first glance (area, gravity and speed). In FR lower gravity (force attracting nodes to the centre) values made the cluster less tight, making it easier to distinguish individual nodes. It took much longer to run and needed to be manually stopped.
Without use of colour to highlight modularity I found it difficult to see any structure within FR layout, so I opted for using YH method for the analysis. It seems that OpenOrd (modification of FR) would be best for detection of distinct clusters – but this must be an imported add on as I do not see it in the default layout list. Something to explore at another time:)
I also found Gephi Tutorials on Slideshare re:layout helpful:
The choice of methods depends on topology you want to emphasise (also size of your network though)
- Area = graph size area
- Gravity = increasing gravity reduces dispersion of disconnected components by pulling them into the centre
IMPORTANT: When the algorithm does not converge – need to reduce speed to gain precision (unstable nodes position/unstable graph)
Playing around with sizing the nodes based on their centrality measures gave a quick overview of the visual overlap between the nodes ranking highest on the different measures. For example, in Twitter network, nodes with highest values of betweeness centrality also looked like the ones with highest values of degree centrality (see below).
Ultimately I sized the nodes based on betweeness centrality and inserted the degree centrality score as a label (purely a matter of convenience as degree centrality values were just too large to fit into the node circles!).
Visualising communities identified in the networks
The tutorial suggested by Dagan – by Jen Goldbeck, was useful here although I have still not worked out how to highlight the selected nodes in the Data lab spreadsheet from the right click.
I fiddled with size range for the nodes by increasing the minimum size so that the circles are large enough to show the colours.
I also played around with modularity factor, increasing it to 2 for Twitter and to 1.8 for blogs in order to decrease the number of communities for clarity (from 12 to 8 and from 6 to 4 respectively).
I also changed some of the colours to provide better contrast between the different communities.
NOTE: any overlap in colours for Twitter and blog visualisation is accidental and does not indicate overlap in communities across these two environments.
And voilla – two pretty pictures!
What does it all mean? I think I may have to cover that in the next post – the actual Assignment for week 3 demonstrating my competencies…It will be nice to finally have some questions to answer:)