Apr
24
2012

The Cost Of The Cloud

I’ve been interested in the main developments in cloud storage that have happened over this week. With Google announcing Google Drive, and SkyDrive customers being upgraded to 25GB of free storage. I thought I would have a look at the 3 big players (Dropbox, Google Drive, and SkyDrive) and have a comparison between their free storage provided and their pricing for additional storage.

Free Storage

Dropbox – 2GB on account creation and an additional 500MB per referral (capped at 18GB).
Google Drive – 5GB (more specifically, 5 GB in Google Drive, 1 GB in Picasa, and 10 GB for Gmail).
SkyDrive – 25GB (for a limited time, then it will be 7GB).

Pricing
Google offers at most 16TB of storage so I’ve been using this as the comparison point.

Purchasing any additional storage on Google Drive will increase your data limit to 25GB on Gmail.
Purchasing additional storage on SkyDrive does not affect the size of your Hotmail inbox, however Hotmail has a larger data limit of 500GB.

Google Drive – $9,599.88 (£5,949.36) per year.
Dropbox – $31,840.00 (£19,734.72) per year.
Dropbox Teams – $10,170.00 (£6,302.67) per year and with a team consisting of 80 people.
SkyDrive – £5,120.00 ($8,260.60) per year.

8 x 2TB hard disk – approx. ~£560.00 ($903.50), one off fee, it lasts until the disk breaks.

Here is how these costs were calculated:
- Google Drive: 16TB per month * 12.
- Dropbox: 100GB per year * 160 (to get to 16TB), note this will allow you to gain a maximum of 5.12 TB of storage if you have 5120 referrals.
- Dropbox Teams: initial price for 5 team members (gives you a maximum allowance of 1TB) + 75 * cost of additional team member (200GB is given for each additional team member).
- SkyDrive: 100GB per year * 160.

Ignoring just buying your own hard disks and not having any sort of data redundancy, it seems that SkyDrive is the cheapest option, and with Dropbox being almost 4 times more expensive.

I was actually very surprised by these costs. Especially by how similar the cost between SkyDrive and Google Drive is, and how expensive Dropbox is as the current market leader.

It looks like for the mean time I will be sticking with the free storage options!

Feb
05
2012

TwiStatistic – Twitter Account Analysis

Over the Christmas holidays (when I should have been revising) I wrote a small website which would allow users to lookup Twitter users and generate some interesting statistics about their tweeting habits, their followers and following. The statistics generated, I found, can be quite interesting – for instance did you know that 55% of my tweets are retweets, my most commonly tweeted word (excluding hashtags and words less than 5 letters) is Windows, and of all the people who follow me, Ben Nunney (@bennuk) tweets the most!

If you haven’t tried this yet – enter your Twitter username here and give it a go:

I thought I’d use this post to explain what data I retrieve, how it is processed, and how the graphs are generated (it’s a really cool API from Google).

Data Acquisition

Firstly the users information must be retrieved from Twitter, this is done using the Twitter API. These calls can be made in an authenticated and unauthenticated mode – the main difference (in the interests of TwiStatistic) is the way the API calls are limited – in unauthenticated mode, calls are limited to a smaller number and are shared with all users, whereas in authenticated mode, calls are still limited, but are limited to a larger value (thus more calls can be made), and are private to the user (i.e. API calls from one user, will not affect another users limit).

Once TwiStatistic has determined its method of making requests, the first thing it will do is attempt to get the last 200 tweets (the most that can be retrieved in a single API call) – if it has found any tweets, user information will also be extracted (saving an API call), however if this fails, then an additional request for the users basic information will be made. Once the tweet and user information has been retrieved, the follower and following list is retrieved. This is all the information that TwiStatistic requires – now it just needs to be processed!

Processing

TwiStatistic currently generates the following statistics with the following data:

Using tweet data:

  • Original Tweets vs. Retweets.
  • Retweet frequency of “Original Tweets”.
  • Users mentioned in tweets.
  • Weekly tweeting frequency.
  • Monthly tweeting frequency (useful to see the tweet sample data range).
  • Hashtags used.
  • Tweets per day (assuming the user tweets at least once).
  • Most common words (excluding short words).
  • General user information (as seen on each profile on Twitter).
  • User account creation date.

Using Follower and Following data:

  • Language.
  • Time zone.
  • Year of account creation.
  • “Top tweeter”, user with most tweets.

Graph Generation via Google

Once the data has been processed it needs to be visualised in a meaningful way so that the user can interpret it easily. For this I decided to display most of the results in a graphical form, this has been done by using the Google Chart API. The easiest way to explain how it works is by demonstrating it by example.

Here is a pie chart showing the year that accounts that follow me on Twitter have been created:

account creation chart
The URL for this is (with a few newlines):

https://chart.googleapis.com/chart?

cht=p3&chf=bg,s,cccccc&chco=333333&chs=400×100&
chd=t:26,4,35,10,7,4&chds=a&chl=2009%20(26)|2012%20
(4)|2011%20(35)|2008%20(10)|2010%20(7)|2007%20(4)

Below is a list explaining what each part of the URL is doing:

  • https://chart.googleapis.com/chart

This is where the API is located.

  • ?cht=p3

This tells the API that we wish to create a 3D pie chart – there are several other codes which dictate the type of chart you wish to draw.

  • &chf=bg,s,cccccc

The background should be a solid colour of the hex value #CCCCCC.

  • &chco=333333

The chart should use the colour #333333.

  • &chs=400×100

The dimensions of the chart – these are limited in both the x and y axis as well as the total number of pixels.

  • &chd=t:26,4,35,10,7,4

The chart data – in this case the total accounts created in each year.

  • &chds=a

How we wish to scale the chart – in this case we want to do it automatically.

  • &chl=2009%20(26)|2012%20(4)|2011%20(35)|2008%20(10)|2010%20(7)|2007%20(4)

The chart labels for each data point (URL encoded).

That’s all that is to it!

The above example shows you the basics – all you really need to create generic charts, if something a little more advanced is required I’d highly recommend the following resources:
- http://code.google.com/apis/chart/image, this is the API homepage and covers everything!
- http://code.google.com/apis/chart/image/docs/chart_wizard.html, this is a tool which allows you to very easily create a mock-up of the chart to use in your project.

 

Jan
31
2012

Radio Silence – No More

Firstly I’d like to apologise for the lack of posts as of late – I have essentially been offline for the past couple of months getting ready for my first semester exams and whatnot, but I’m pleased to say that they are all over! Well… Until the end of the second semester anyways!

I’ve got a few posts coming up soon with some things I’ve been working on such as TwiStatistic - a Twitter account analysis tool, the progress I’ve made on my Windows System API overriding project, progress on a new Oscilloscope feature and a few more things!

Nov
26
2011

Whereabouts are you?

If you’re one of the people who are using my uni pinging script you may have noticed that it has sped up slightly, this is because it has now been partially parallelised (it sends out a maximum of 10 pings to 10 different hosts every couple of seconds)!

Another feature I’ve been wanting to implement has been the ability to find out which computers your friends are currently at, so you can meet up with them, find what lab they are in or whatever, I’ve now written a little script which users can opt in to which notifies the server whenever you log into a machine.
This seems like a simple concept, however, I wanted setup and use to be as simple as possible (thus increasing the complexity), while maintaining a secure protocol (basically make it difficult for someone else to say that another user is logged in at machine ‘x’), I thought I’d do a quick write up of how it works. All in all, it functions well for a few hours work!

Essentially the script is broken down into three parts, the user registration and installation component – I wanted this to be a simple as executing a program, the notifier; the script which notifies the server that you have logged in, and lastly the lookup client, allow people to query the server for when the user had last logged in. I’ll explain each with a paragraph to themselves.

User Registration
The user first has to call a script on my share which can be read and executed by all. Essentially all this script does is send to my server what machine they are coming from, and their user name! The server then validates their host name and checks to see if the user has already registered, if this is all fine the server then produces a secret key for the user, and sends this back! The script will check if all went well and then set up the appropriate paths (aliases and a script to run on login) – the secret key is used for the login. If I had simply passed the user name it would have been possible to “check in” another user simply by passing this information.

Checking In
This script is called when the user logs in, since it calls a web page, in order to not impact performance, this script is backgrounded. This sends the server the secret key, the user name, the current host name, and if the user is logging in or out. This information is then validated and then the server updates it’s information.

Look Ups
Finally, the main component, looking up where a user is. This takes your user name and the requested user, after validating that your user name has not been spoofed (think what other information is provided when connecting to a web server, and what we can assume to know about the current user), we search the database for the user. If they are found, the known information is returned, otherwise a message informing the user to register them with the service is returned!

If you want more info, leave a comment below!

Nov
13
2011

Calculating Noise Levels From Uncalibrated Microphones

I’ve recently started creating more features for my WP7 app, Oscilloscope.

Currently all it does is take input from the phone’s microphone, and then displays a waveform on screen, which aside from looking quite cool is fairly gimmicky – I want it to actually have some sort of usefulness, naturally the first thing that would be useful would be knowing the loudness of the noise (in decibels).

The problem with this is that the microphone returns values ranging between 0 and 255 (i.e. a byte), and thus this information is not given, and meant I would have to use this information somehow to calculate it. After a search on the Internet to see how other developers have done it, I discovered that generally without specialised hardware it was not possible, or this value was not required for the developer (so they stopped trying to gain a solution), so I guess doing it the easy way was out of the question!

I next looked up what the actual definition of a decibel was, and this was when I stumbled upon the 10 log rule. Basically what it says is that if you have two different power levels, the 10 log rule will tell you the difference (in dB) between the two (i.e. difference\ in\ dB = 10 \log({\frac{power_2}{power_1}}))- note that it is important to remember that a decibel is just a measurement of scale!
This rule is fantastic as it means if I know the “power” of one waveform, and can calculate the difference of another and simply add the known level on it – thus achieving exactly what I wanted!

The only problem now remaining is getting this reference waveform, one of the problems mentioned in my initial searches was that it was impossible because the microphones had not been calibrated, I guess this means I’ll need to perform some sort of software calibration then!
After a few more searches I discovered that typical silence occurs at about 30dB, so what I can do is simply sample some “silence” from the phone and assume this to be my reference of 30dB, although this will not give me 100% correct values (since my reference is probably slightly incorrect), I now have all the basic components I require in order to calculate the noise level in dB.

The final step now is producing a single value from the waveform to use as the “power”, if you haven’t done much signal processing before, you might be tempted to calculate the mean of all the values, although this seems like a sensible idea, if you consider that the AC voltage from the mains alternates is effectively a sinusoidal wave, the mean will be about zero – not very useful! Instead, what is calculated is the RMS (Root Mean Square) value, the name basically describes what you do, you calculate the mean of all the values squared, then square root it (in the case of a sine wave this can be simplified to \frac{x}{\sqrt{2}} where x is your peak value), so this is done once with the reference waveform (as this is all the information required), and then this is calculated with each new waveform I receive from the microphone, the units of these two waveforms don’t matter, so long as they are both the same; what is important is the ratio between the two (as I said earlier, dB is simply a scale of sorts).

I can then throw these values into the 10 log rule and finally get a decibel reading from the microphone!

Oct
16
2011

The Website Is Down

So our network at uni went down earlier this week for the first time since I’ve arrived last year, so I quickly wrote a PHP script to find out how to check if hosts are online or not (basically, whether they are responding to pings or not – you can see it in action here), it’s quite an easy task, however I wasn’t really aware of how to do this!

Essentially it boils down to a fantastic little package from PEAR (PHP Extension and Application Repository) called Net_Ping – only a few lines of code are required after installing this and then you are ready to go!

1
2
34
5
6
7
8
9
10
1112
13
<?php
require_once 'Net/Ping.php'; // include the Net_Ping package
$ping = Net_Ping::factory(); // create the object// check for any errors in the creation
if (PEAR::isError($ping))
{
  echo $ping->getMessage();
  exit();
}
// Now perform the ping!
$result = $ping->ping('www.example.com');print_r($result);
?>
<?php
require_once 'Net/Ping.php'; // include the Net_Ping package
$ping = Net_Ping::factory(); // create the object
// check for any errors in the creation
if (PEAR::isError($ping))
{
  echo $ping->getMessage();
  exit();
}
// Now perform the ping!
$result = $ping->ping('www.example.com');
print_r($result);
?>

If all goes well, the output should be similar to the following:

Net_Ping_Result Object
(
[_icmp_sequence] => Array
(
[1] => 17
[2] => 17
[3] => 17.1
)

[_target_ip] => 192.0.43.10
[_bytes_per_request] => 64
[_bytes_total] => 192
[_ttl] => 244
[_raw_data] => Array
(
[0] => PING www.example.com (192.0.43.10) 56(84) bytes of data.
[1] => 64 bytes from 43-10.any.icann.org (192.0.43.10): icmp_seq=1 ttl=244 time=17.0 ms
[2] => 64 bytes from 43-10.any.icann.org (192.0.43.10): icmp_seq=2 ttl=244 time=17.0 ms
[3] => 64 bytes from 43-10.any.icann.org (192.0.43.10): icmp_seq=3 ttl=244 time=17.1 ms
[4] =>
[5] => --- www.example.com ping statistics ---
[6] => 3 packets transmitted, 3 received, 0% packet loss, time 2003ms
[7] => rtt min/avg/max/mdev = 17.000/17.076/17.163/0.125 ms
)

[_sysname] => linux
[_round_trip] => Array
(
[min] => 17
[avg] => 17.076
[max] => 17.163
[stddev] => 0.125
)

[_transmitted] => 3
[_received] => 3
[_loss] => 0
)

So as you can see, you can now easily pull out all the information you may need! However, I found that if I used this, it’s very slow – these 3 pings took about 2 seconds to process (see the raw data) and would take even longer if the host is offline (waiting for timeouts)! So you may want to set some arguments. This fortunately is very simple and only requires you to add one additional line (excluding my comments):

10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
// set the arguments for the ping
/* There are various arguments which can be set, these include:
 * count - amount of pings to send.
 * quiet - "verboseness" of output.
 * iface - the network interface to ping from.
 * ttl - the time to live of the ping.
 * timeout - how long before we timeout (seconds).
 * size - size of the ping.
 *
 * Note: some arguments don't work on some OSes.
 */
// In my case I am pinging several hosts, and don't mind the
// occasional false negative so only send one ping, and
// timeout after one sec.
$ping->setArgs(array('count' => 1, 'timeout' => 1));
// set the arguments for the ping
/* There are various arguments which can be set, these include:
 * count - amount of pings to send.
 * quiet - "verboseness" of output.
 * iface - the network interface to ping from.
 * ttl - the time to live of the ping.
 * timeout - how long before we timeout (seconds).
 * size - size of the ping.
 *
 * Note: some arguments don't work on some OSes.
 */
// In my case I am pinging several hosts, and don't mind the
// occasional false negative so only send one ping, and
// timeout after one sec.
$ping->setArgs(array('count' => 1, 'timeout' => 1));

The performance I gained from this was massive, as I no longer have to wait a long time for each host if it is down (as the timeout is now 1 second), but also since only a single ping is sent, I very quickly ping the next host (beneficial to both online and offline hosts, due to the fact that when multiple pings in are sent in one ping command, there is a delay between each ping sent to prevent flooding the host, however, this additional delay in turn causes the script to take a long time to execute).

Oct
09
2011

Under the hood

Personally I find the really low level things that computers do very interesting, and although this is covered a bit in my uni course I feel the need to “poke about” a real system, so I’ve came up with a little humourous (well, for me anyways) project for myself which should allow me to learn about this area.

Effectively my plan is to produce a program which hooks onto all the active processes and “override” some form system API such as DrawText and then flip the string, which should hopefully acheive the effect of inverting all the text on the screen!

I’ve actually been wanting to do this for a while but have put it on the backburner since I wasn’t familiar with the programming language I was hoping to write it in, C++, however I reckon now is a good time to do so as I’ll also be learning C at the same time and I understand that C++ is basically an OOP version of C (although it should be possible to do it in C) . The reason for needing it to be done in either C or C++ is that in a Managed language such as C# or Java the programmer doesn’t have enough access to the low level areas of the computer (or it is at least restricted), which has both it’s advantages and disadvantages, whereas C and C++ are Unmanaged programming languages and let you mess about with these areas – this gives me the access which is required in order to be able to access the DrawText method more or less directly.

I’ll post more about this after I’ve made some headway with it.

Oct
03
2011

namespace Blog

I’ve been meaning to set up a blog for a while now and have finally got round to do it. The main reason for me wanting to do so is that it shall hopefully allow myself to collate my thoughts etc… I’ve not had much experience at blogging, but from the little I’ve had I have found it quite enjoyable (you can see my attempts at it here).

Since this is my first post a little bit of intro seems appropriate. I am currently in my second year studying Computer Science at the University of Manchester (which I really enjoy!) and I am interested in essentially anything technological related – which I will probably be blogging about, and yes, I am aware there are thousands other blogs like this but oh well. :)

Not too sure on how often I’ll be posting – we’ll see what happens, although I hope to eventually settle at something quite regular.