«

»

Feb
05
2012

TwiStatistic – Twitter Account Analysis

Over the Christmas holidays (when I should have been revising) I wrote a small website which would allow users to lookup Twitter users and generate some interesting statistics about their tweeting habits, their followers and following. The statistics generated, I found, can be quite interesting – for instance did you know that 55% of my tweets are retweets, my most commonly tweeted word (excluding hashtags and words less than 5 letters) is Windows, and of all the people who follow me, Ben Nunney (@bennuk) tweets the most!

If you haven’t tried this yet – enter your Twitter username here and give it a go:

I thought I’d use this post to explain what data I retrieve, how it is processed, and how the graphs are generated (it’s a really cool API from Google).

Data Acquisition

Firstly the users information must be retrieved from Twitter, this is done using the Twitter API. These calls can be made in an authenticated and unauthenticated mode – the main difference (in the interests of TwiStatistic) is the way the API calls are limited – in unauthenticated mode, calls are limited to a smaller number and are shared with all users, whereas in authenticated mode, calls are still limited, but are limited to a larger value (thus more calls can be made), and are private to the user (i.e. API calls from one user, will not affect another users limit).

Once TwiStatistic has determined its method of making requests, the first thing it will do is attempt to get the last 200 tweets (the most that can be retrieved in a single API call) – if it has found any tweets, user information will also be extracted (saving an API call), however if this fails, then an additional request for the users basic information will be made. Once the tweet and user information has been retrieved, the follower and following list is retrieved. This is all the information that TwiStatistic requires – now it just needs to be processed!

Processing

TwiStatistic currently generates the following statistics with the following data:

Using tweet data:

  • Original Tweets vs. Retweets.
  • Retweet frequency of “Original Tweets”.
  • Users mentioned in tweets.
  • Weekly tweeting frequency.
  • Monthly tweeting frequency (useful to see the tweet sample data range).
  • Hashtags used.
  • Tweets per day (assuming the user tweets at least once).
  • Most common words (excluding short words).
  • General user information (as seen on each profile on Twitter).
  • User account creation date.

Using Follower and Following data:

  • Language.
  • Time zone.
  • Year of account creation.
  • “Top tweeter”, user with most tweets.

Graph Generation via Google

Once the data has been processed it needs to be visualised in a meaningful way so that the user can interpret it easily. For this I decided to display most of the results in a graphical form, this has been done by using the Google Chart API. The easiest way to explain how it works is by demonstrating it by example.

Here is a pie chart showing the year that accounts that follow me on Twitter have been created:

account creation chart
The URL for this is (with a few newlines):

https://chart.googleapis.com/chart?

cht=p3&chf=bg,s,cccccc&chco=333333&chs=400×100&
chd=t:26,4,35,10,7,4&chds=a&chl=2009%20(26)|2012%20
(4)|2011%20(35)|2008%20(10)|2010%20(7)|2007%20(4)

Below is a list explaining what each part of the URL is doing:

  • https://chart.googleapis.com/chart

This is where the API is located.

  • ?cht=p3

This tells the API that we wish to create a 3D pie chart – there are several other codes which dictate the type of chart you wish to draw.

  • &chf=bg,s,cccccc

The background should be a solid colour of the hex value #CCCCCC.

  • &chco=333333

The chart should use the colour #333333.

  • &chs=400×100

The dimensions of the chart – these are limited in both the x and y axis as well as the total number of pixels.

  • &chd=t:26,4,35,10,7,4

The chart data – in this case the total accounts created in each year.

  • &chds=a

How we wish to scale the chart – in this case we want to do it automatically.

  • &chl=2009%20(26)|2012%20(4)|2011%20(35)|2008%20(10)|2010%20(7)|2007%20(4)

The chart labels for each data point (URL encoded).

That’s all that is to it!

The above example shows you the basics – all you really need to create generic charts, if something a little more advanced is required I’d highly recommend the following resources:
- http://code.google.com/apis/chart/image, this is the API homepage and covers everything!
- http://code.google.com/apis/chart/image/docs/chart_wizard.html, this is a tool which allows you to very easily create a mock-up of the chart to use in your project.

 

Leave a Reply

Your email address will not be published.