Posted by: Michael Atkisson | June 18, 2011

Visualizing my Twitter Network with NodeXL

Introduction:

As part of the Semantic Web, Linked Data, and Intelligent Curriculum subject matter for week three of the LAK11 Open Course, I made NodeXL visualizations of my Twitter network and of a Twitter search on “Learning Analytics.” NodeXL is a Microsoft-developed, open source Excel add on built for entry-level, social network analysis (SNA) researchers who are not experienced in data mining. So far, I have found it very easy to get started. You can download data sets directly from Twitter, Facebook, YouTube, and Flicker, as well as import other data sets that you create from other sources. I just downloaded the software and connected it with my Twitter account. There was just one hiccup. It looked like an authentication error resulted because I was asking for too much data. Once I rolled that back a bit, it downloaded my Twitter data with ease. On the third analysis I did in the hour, I got a limit placed on me by Twitter and I had to wait about an hour or so for the third request to be processed. Overall it was pretty slick, seeing that I didn’t have to waste time worrying about how to structure the data and what visualizations to build in. It is all right there. There is a learning curve to be able to organize the data and adjust the visualization properties, but within two hours I was able to read the manual and manipulate data in ways that I would be able to answer research questions. I’m no pro yet, but I felt more encouraged than I thought I would seeing that my only other SNA excursion was with SNAPP (an easier to use program to use but with much less flexibility). There are some limitations though, which I will discuss through the examples.

Visualization 1: SNA of Twitter Accounts by Number Followed

First, I queried all the Twitter accounts that I follow and all the accounts that follow me and decided to look at what the variance was in how many accounts a particular account follows. So, I scaled the size of the nodes to that data across all the accounts that I follow. “hootsuite” is the big node outlier in my network, following over 520,000 Twitter accounts.

This data set was of the Twitter accounts that I follow. The lines between nodes are directional representing the connections among the accounts that I follow. The nodes are scaled by how many Twitter accounts each node is following.

This data set was of the Twitter accounts that either I follow or follow me. The lines between nodes are directional, representing the connections among the accounts that I follow. The nodes are scaled by how many Twitter accounts each node is following. The nodes towards the center of the graph have more connections among themselves than the nodes around the periphery.

Visualization 2: SNA of Twitter Accounts by Number Following

Next I looked at the same data set but scaled the nodes to how many followers each account has. Here more variance appeared in the large numbers. There were about 15 accounts in the data set that had significant numbers of followers, opposed to previous visualization where only one account was following others on a large scale.

This is a NodeXL SNA visualization of Twitter accounts that either I follow or that follow me. I scaled the nodes by how man followers each account has.

This is a NodeXL SNA visualization of Twitter accounts that either I follow or that follow me. I scaled the nodes by how many followers each account has.

Visualization 3: Scaling Nodes by Number of Tweets

On the third try I wanted to see how the network varied in numbers of Tweets in too ways. I wanted to see which accounts were tweeting more than others, but I also wanted to draw a line in the sand; who in the network has tweeted 1,00o or more times? So in this graph I also had the shape of the node change to a square when the account had 1,000 or more tweets. There are some small squares, so the larger ones have a significant amount of tweets. The largest, for example is daveyp at 74,550 total tweets on the day that I did the analysis.

This is an SNA of my twitter network with the nodes scaled by number of tweets. The node shape also changed from a circle to a square as the number of tweets exceeds 1,000.

This is an SNA of my Twitter network with the nodes scaled by number of tweets. The node shape also changes from a circle to a square as the number of tweets exceeds 1,000.

Visualization 4: Tweet-Scaled Nodes Grouped by ????

Next I wanted to test the clustering feature. It did a great job of breaking my data set up into visually distinct groups, but I could not find anywhere in the documentation as to what the groupings were based on. Maybe this is because I am new to data mining and SNA, but generally it is good to make explicit the data that is being sourced for clusters. What algorithm is used to make the calculation is cited, but what use is that to the audience of the tool? Auto clustering is Cool, but this falls short on the immediate usefulness of the other features in NodeXL.

SNA of My Twitter Network Using NodeXL's Auto Grouping

SNA of My Twitter Network Using NodeXL's Auto Grouping.

 

Visualization 5: Twitter Search for “Learning Analytics”

Next I queried a Twitter search for “Learning Analytics.” It returned recent tweets and the connections among them, if any. Edges (the lines between nodes) were color categorized by connection type. Dark blue lines mean that one node is following the other. Sky blue lines are mentions. I scaled the nodes by the number of tweets the account has made overall and assigned the label to display the hash tags used by any tweets from the account within this query. The auto grouping is also turned on here, but again I am not sure at this point what properties the nodes are being grouped by. Frequent tags in this query were #Calrg11, #sakai11, #mupple, #edchat, and #kmiou. I was surprised to see only 1 LAK12 tag.

I did a Twitter search for "Learning Analytics" and this network resulted.

I did a Twitter search for "Learning Analytics" and this network resulted.

 

Visualization 6: Circle Diagram of Twitter Search

The circle diagram of the same data that I have in visualization 5 sheds a different light. Rather than show the centrality of connections among nodes by the x and y coordinates of the nodes, this put all the nodes equally distant from one another so you can see the density and direction of the edges. One interesting thing here is that the number of overall tweets does not seem to be indicative of the number of connections between tweeters on this topic.

 

Circle SNA of a Twitter Query for "Learning Analytics" Labeled by Present Hash Tags.

Circle SNA of a Twitter Query for "Learning Analytics" Labeled by Present Hash Tags.

 

Conclusion:

It seems that this tool is more versatile than SNAPP. I may be mistaken but I was under the impression that SNAPP only allows visualization of discussion forms. With NodeXL, a wider variety of data is made instantly available. Instructors of online or blended courses could reacquire students to tweet with a class hash tag and save the search for the class in their twitter account or aggregator. Then a daily snapshot of the discussion could easily be monitored outside of an LMS. Though more steps would be required to get discussion form data into NodeXL than the ease of pushing one button in your browser like you do with SNAPP, more flexable analysis is available in NodeXL and it is also fairly easy to use. The challenge that may arrise is if there are a lot of students; then, skill in filtering and sorting data will be needed to make sense of how to display the visualizations.

At a superficial level, I have had a good experience with NodeXL. However, some noticeable challenges prevented me from making serious analysis of my data in a short a mount of time. The variables by which clustering was suggested were not made known. Also, the worksheets that supposedly had the totals and averages were not displaying the statistics that the manual said they would. This made it difficult to get a descriptive feel for the data in terms of quantities rather than visual cues. Nevertheless, with a little more time exploring, I’m sure I’ll figure it out. The bottom line is, for the beginner, NodeXL is great for small and medium size SNA data sets.


Responses

  1. Hello!

    Thank you for the interest in NodeXL!

    You can find information about the three algorithms used for clustering in the “Groups” menu of NodeXL.

    We have implemented three clustering algorithms: Girvan-Newman (http://en.wikipedia.org/wiki/Girvan%E2%80%93Newman_algorithm), Clauset Newman Moore, and Wakita and Tsurumi. See: http://en.wikipedia.org/wiki/Clustering_algorithm for an overview.

    Each applies a different approach for dividing a network into sub-regions.

    Regards,

    Marc (for Team NodeXL)

    • Marc,

      Thanks for the direction. I’ll take a look.

      Michael

  2. […] Ayres, I. (2007, November 8). YouTube – Authors@Google: Ian Ayres. Retrieved from http://www.youtube.com/watch?v=5Yml4H2sG4U&feature=player_embedded […]

  3. […] Fritz, J. (2011, January 11). Learning Analytics. Retrieved from https://sas.elluminate.com/site/external/jwsdetect/playback.jnlp?psid=2011-01-11.1101.M.340DDA914E66… […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Categories

%d bloggers like this: