Sunday, 23 January 2011

Data and there's a lot of it?

http://www.flickr.com/photos/chelseagirlphotos/


Over recent years data has exploded especially driven by the real time web and media. With this comes a number of problems for many businesses and people: tracking, monitoring, understanding, engaging with it and filtering noise. Having Google Analytics or some other on site monitoring tool installed on your site, ensuring you have registered with the various search engine and their webmaster tools goes some way into understanding what's happening on your site and identifying problems. But what this does not tackle what is being said on the wider web keeping an eye on what your competitors are up to, tackling outbursts by clients or customers or any of the many other things I am sure you can think of that happen every day day on the web.

Background
Trying to solve these problems is not straight forward especially as the data keeps on growing. One company offering an abstracted layer on top of this to help simplify the problem is Mediasift. I first became aware of Mediasift or fav.or.it as it was called then about 3 years ago while at a British Computer Society event on Pro-Blogging. One of the attendees was a man called Nick Halstead and during the talk he mentioned his company fav.or.it which curated channels based around RSS feeds from blogs making it easy for anyone to find content they were interested in and follow as well as comment on it.

This was a great idea as it solved the what is RSS question most people who don't have a technical understanding of the web have as well as that odd orange icon that appears in the address bar my parents ask about. With that I signed up to use it as soon as it came out and it provide me with some new resources to follow and help me to discover new content quickly. However around the same time on the internet another service was gaining traction Twitter (I think Nick was even using during the talk!).

So out of Twitter and their great API came TweetMeMe as far as I can tell this was a great success and still is with sites such as mashable using their retweet button. The following is taken straight from the Tweetmeme site but expalins their offering much better than i could:

"TweetMeme is a service which aggregates all the popular links on Twitter to determine which links are popular. TweetMeme categorises these links into Categories, Subcategories and Channels, making it easy to filter out the noise to find what you're interested in."

DataSift
So this leads nicely on to Datasift a new web service (SAAS) going through Alpha testing at the moment I was lucky enough to get an Alpha invite and before Christmas was playing with the service, which is fantastic! So the problem I opened with all that data, many different API's to learn and it being difficult to get started with suddenly starts to get a little easier.

Datasift pulls in data from all over the web: Twitter, Myspace, Digg, Wordpress, Buzz, Six Apart and Facebook were listed last time I looked. As you can see they pretty much cover some of the top social destinations on the web, and I am sure the number of sources will continue to grow. The key thing to think about here is Datasift now provide a one stop shop for all that public data, so that's one API to learn and integrate with (*time saver)

One concern I had was that it was going to be difficult to get started, this turned out to be miss judged. If you are an Excel "guru" or have a basic understanding of SQL using Curated Stream Definition Language (CSDL) - (it was FSDL last time logged in so things are moving a long quiet quickly) is nice and straight forward. I had a great stream up and running in less than an hour doing something basic -  pulling job references from multiple sources.

Once you have something simple in place its time to read the documentation as they allow you to do all sorts of great things with the data such as play with geo information, look at influence metrics provide by sources such as PeerIndex and Klout. Streams can also be plugged together (using a unique ID called a Definition) this means you can build one plug another on to it to quickly iterate on ideas. The software also has a published list of Streams people have shared which anyone can build on or use.

Here is a quick example so you get a small idea of what its like to create a Stream:

((twitter.text contains_word "SEO" or twitter.text contains_word "SEM") and not twitter.text contains "guru") and language.tag == "en"

That would produce a list of all the tweets containing either SEO or SEM and nothing to do with the word guru in English. Now you have that how about only those people classed as influencers easy:

(((twitter.text contains_word "SEO" or twitter.text contains_word "SEM") and not twitter.text contains "guru") and language.tag == "en") and (peerindex.score >60 or klout.score >60)
As you can see its really easy to build streams that are focused around a subject your interested in as well as easily filter out all the stuff you don't want to see.

Some ideas I think Datasift is going to be good for:

  • Research - using there API you can slice and dice data a whole host of ways meaning you can quickly build up data sets, get a snap shot of the public's perception and understand whats going on in real time. Identify trends from multiple sources.
  • Dashboards - I expect to see lots of people use this service to add meaning to existing data. Imagine adding a client to salesforce and being able to pull in tweets, identify influential people on the web associated with the business, gather the publics view of the business as well as idenifty blog posts mentioning them or there products. 
  • Vertical / Niche content services.
  • Mashups  - See here for an example: http://hootware.com/app/checkins/
Benefits of the service:

  • Easy to get going. 
  • They handle the all the masses of data you only get the stuff you are interested in.
  • Real time data. 
  • Multiple input sources. 
  • API to integrate with your software
  • Great customer service!
As you can see I am pretty excited about the service and the opportunities it opens up, now all I need to do is come up with something great. 

Links to find out more:

www.youtube.com/user/datasift
twitter.com/datasift
www.datasift.com
blog.datasift.com/