Sunday 23 January 2011

Data and there's a lot of it?

http://www.flickr.com/photos/chelseagirlphotos/


Over recent years data has exploded especially driven by the real time web and media. With this comes a number of problems for many businesses and people: tracking, monitoring, understanding, engaging with it and filtering noise. Having Google Analytics or some other on site monitoring tool installed on your site, ensuring you have registered with the various search engine and their webmaster tools goes some way into understanding what's happening on your site and identifying problems. But what this does not tackle what is being said on the wider web keeping an eye on what your competitors are up to, tackling outbursts by clients or customers or any of the many other things I am sure you can think of that happen every day day on the web.

Background
Trying to solve these problems is not straight forward especially as the data keeps on growing. One company offering an abstracted layer on top of this to help simplify the problem is Mediasift. I first became aware of Mediasift or fav.or.it as it was called then about 3 years ago while at a British Computer Society event on Pro-Blogging. One of the attendees was a man called Nick Halstead and during the talk he mentioned his company fav.or.it which curated channels based around RSS feeds from blogs making it easy for anyone to find content they were interested in and follow as well as comment on it.

This was a great idea as it solved the what is RSS question most people who don't have a technical understanding of the web have as well as that odd orange icon that appears in the address bar my parents ask about. With that I signed up to use it as soon as it came out and it provide me with some new resources to follow and help me to discover new content quickly. However around the same time on the internet another service was gaining traction Twitter (I think Nick was even using during the talk!).

So out of Twitter and their great API came TweetMeMe as far as I can tell this was a great success and still is with sites such as mashable using their retweet button. The following is taken straight from the Tweetmeme site but expalins their offering much better than i could:

"TweetMeme is a service which aggregates all the popular links on Twitter to determine which links are popular. TweetMeme categorises these links into Categories, Subcategories and Channels, making it easy to filter out the noise to find what you're interested in."

DataSift
So this leads nicely on to Datasift a new web service (SAAS) going through Alpha testing at the moment I was lucky enough to get an Alpha invite and before Christmas was playing with the service, which is fantastic! So the problem I opened with all that data, many different API's to learn and it being difficult to get started with suddenly starts to get a little easier.

Datasift pulls in data from all over the web: Twitter, Myspace, Digg, Wordpress, Buzz, Six Apart and Facebook were listed last time I looked. As you can see they pretty much cover some of the top social destinations on the web, and I am sure the number of sources will continue to grow. The key thing to think about here is Datasift now provide a one stop shop for all that public data, so that's one API to learn and integrate with (*time saver)

One concern I had was that it was going to be difficult to get started, this turned out to be miss judged. If you are an Excel "guru" or have a basic understanding of SQL using Curated Stream Definition Language (CSDL) - (it was FSDL last time logged in so things are moving a long quiet quickly) is nice and straight forward. I had a great stream up and running in less than an hour doing something basic -  pulling job references from multiple sources.

Once you have something simple in place its time to read the documentation as they allow you to do all sorts of great things with the data such as play with geo information, look at influence metrics provide by sources such as PeerIndex and Klout. Streams can also be plugged together (using a unique ID called a Definition) this means you can build one plug another on to it to quickly iterate on ideas. The software also has a published list of Streams people have shared which anyone can build on or use.

Here is a quick example so you get a small idea of what its like to create a Stream:

((twitter.text contains_word "SEO" or twitter.text contains_word "SEM") and not twitter.text contains "guru") and language.tag == "en"

That would produce a list of all the tweets containing either SEO or SEM and nothing to do with the word guru in English. Now you have that how about only those people classed as influencers easy:

(((twitter.text contains_word "SEO" or twitter.text contains_word "SEM") and not twitter.text contains "guru") and language.tag == "en") and (peerindex.score >60 or klout.score >60)
As you can see its really easy to build streams that are focused around a subject your interested in as well as easily filter out all the stuff you don't want to see.

Some ideas I think Datasift is going to be good for:

  • Research - using there API you can slice and dice data a whole host of ways meaning you can quickly build up data sets, get a snap shot of the public's perception and understand whats going on in real time. Identify trends from multiple sources.
  • Dashboards - I expect to see lots of people use this service to add meaning to existing data. Imagine adding a client to salesforce and being able to pull in tweets, identify influential people on the web associated with the business, gather the publics view of the business as well as idenifty blog posts mentioning them or there products. 
  • Vertical / Niche content services.
  • Mashups  - See here for an example: http://hootware.com/app/checkins/
Benefits of the service:

  • Easy to get going. 
  • They handle the all the masses of data you only get the stuff you are interested in.
  • Real time data. 
  • Multiple input sources. 
  • API to integrate with your software
  • Great customer service!
As you can see I am pretty excited about the service and the opportunities it opens up, now all I need to do is come up with something great. 

Links to find out more:

www.youtube.com/user/datasift
twitter.com/datasift
www.datasift.com
blog.datasift.com/

Tuesday 11 January 2011

Location Location and ... Jobs

Thanks To:

Ramkarthik

  

I recently started a conversation on LinkedIn with regard to location based search and jobs in particular a new feature Total jobs have added (see here for an example; http://http://goo.gl/zQyXN). There were lots of positive comments and some good ideas suggested, mainly all the additional information that including mapping gives. This got me thinking how powerful mapping information can be as the visual representation goes so much further than just having the word displayed to provide the end user with a richer more engaging experinece.

Some of the ideas highlighted were:
  • Travel details such as train stations, underground stations, buses etc.
  • Addtional location specific details such as highlighting parking or local amenities.
Recruitment sites showing this rich information to candidates set themselves apart from the other sites because not only do they potentially help you find a job but also provide you a depth of knowledge that you would have to research yourself otherwise. It also helps to speed up the decision making process meaning companies should get more relevant or engaged candidates applying. 

The types of business I see this benefiting the most are direct employers as these are the ones mostly like to disclose highly targeted location information and this could lead to reducing costs of external employment agencies. Something which was hinted at in the video posted on the Guardian website (http://www.guardian.co.uk/advertising/video/2)  where a number of employers commented that they would be looking to higher direct more this year. 

Something else corporate or direct employers can already take advantage when thinking about location based information is rich snippets and structured data.  The key area to think about here is search engines support of structured business data such as the address details, using the right type of mark up within a web page is what makes the difference. 

If for instance you were to encode your businesses address details using the hCard Mircoformats mark up you could include geo data such as the longitude and latitude of the location, now when search spiders crawl the web page indexing the data they can use this information to pin the business to a point on a map or target search results within local listing. Relating this back to the above now you can post business address details with each vacancy (on your site) including the Rich Snippet information giving your vacancy the maximum opportunity to appear in location based searches by candidates but note only that your vacancy could appear in map results. 

Location information could also be posted in Tweets to add great context to them and also mean that when the information is processed by third party software using the twitter stream you message is more highly targeted and when combined with a twitter users location you can only start to imagine what the possibilities are not only for the recruitment sector but any business. 

Links: 
http://goo.gl/Miqli - Google webmaster guide to Rich Snippets and HCard