Monday, 22 October 2012

Parkrun - Community 5k weekly running. #analysis


In 2012 i started running. This was inspired by a brief chat with one of my best friends and a notice in the local paper about a free weekly 5km timed run each Saturday at 9am. 

Now before 2012, 9am on a Saturday was rarely a time i would consider waking up by let alone think about running 5km or 3.2 miles. Way back in March i got up extra early, jumped in the car with my best mate Simon and headed off to the local common. Neither of us thought we would make it round the course with out walking but on that first attempt i managed 26mins 46 Seconds and finished 76th, i was hooked!

I used to enjoy running as a child being in the schools athletic club and taking part in cross country, but that's where it stopped and university was more a training ground in drinking than keeping fit, suffice to say in the last 15 ish years i had not looked after myself. 

You can find out more about Parkrun by visiting the website, they take place all over the UK and throughout the world. They are not meant to be a race and are marketed as a timed run. However you do get a position (I'm a little competitive), points and there is a results table. With my initial run done and finishing in 76th and a good 10mins behind the quickest runners i knew i had work to do. 

Since my initial run i have sliced 6mins 40secs off my time and now regularly finish within the top 25 runners to do this i have had to train (something i did not initially think i would need to do) and run about 6 - 12 miles extra each week. I have also started to enter official races to see how i compete against club runners with my next race being the Great South Run, so all in all parkrun has been a great experience for me. 

Once i had the running under control and was making progress i thought that i needed to keep an eye on my stats. Now Parkrun provide some data on their site, so you get a graph of your age grade results which looks a little like this:

With almost everything else in table format, on the website. This got me thinking about Google Docs or Excel and the fact you can import HTML into spreadsheets!

So without further ado i will run you through my Google Docs Spreadsheet.

First things first, my spreadsheet has a config page this is basically to make it easy for other parkrun runners  to make a copy and plugin in some data and everything else will work. Secondly i am by no means a spreadsheet guru, i am very happy with how far i have taken my document and may add more in future. If you do use it, make improvements or spot mistakes let me know in the comments (also spread the word to other parkrun fans!!). 

Link to the Google Doc parkrun Spreadsheet

So why would you want to use a spreadsheet? 

Perhaps you want a fancy graph of your age grade stats:
Above you can see where i hurt my back and took 4 weeks to get back to PB form and how i took it easy on the first run.

You might want to know how you are doing in the points table: 

Anyone looking at the graph and wondering how someone can have more points but be lower down the table should read the parkrun FAQ (i know i had to). The reason is like this graph is that it shows if its purely running or a mix of volunteering and running that gets you to the top! 

While I'm talking about volunteering here is a graph showing the percentage of volunteers for your sex: 

Interestingly my initial thoughts were that about 10% of people would volunteer at least once and the above chart sort of backs that up (all be just for males at Newbury). I think i need to improve the calculation to only include people that have run more than once but that will have to wait for now. 

Last but by no means least how about the PB count for the latest results: 

Looks like a pretty good week!

Obviously you should check the calculations to make sure you agree with my formulas but i think it works pretty well and once you get started i am sure there is more analysis you can do. 

To get started use the link to the spreadsheet above and save a copy, on the config worksheet add in the right values for your run and the rest should update for you.

A couple of caveats now: 

  1.  If parkrun change the format of the website the spreadsheet will stop working!
  2. Some of the tables need to be manually updated each week to account for changes (i think excel would work better for this using a data table as that would update). 
    1. volunteered on the points sheet 
    2. PB on the weekly results. 
  3. I like to make a copy of the weekly results into a separate worksheet each week to preserve the data in case i want to check something

If you want to have a go your self here is a quick example of the formulas i used in Google Docs: 


What does it mean? Basically grab the content from the page listed passing in the Athlete Number from the Config worksheet, looking for the HTML element of type table and select table number 4.

Monday, 5 September 2011

Playing with Python: The hunt for email addresses

Working in both customer services and as an SEO often presents a complex mix of tasks to do and challenges.

One such challenge was to extract the TO: addresses from over 30000+ plus emails. Not wanting to interrupt the dev team and it being an interesting task i decided to tackle it.

I haven't programmed properly in a few years well near on 10 so i turned to Google and a language i have an interest in Python. After a bit of searching i hit upon the following code by Tumas Rasila, this provide a great starting point as it covers the basics really well, that being reading a file and extracting email addresses.

#!/usr/bin/env python
# coding: utf-8

import os
import re
import sys

def grab_email(file):
    """Try and grab all emails addresses found within a given file."""
    email_pattern = re.compile(r'\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b',re.IGNORECASE)
    found = set()
    if os.path.isfile(file):
        for line in open(file, 'r'):
    for email_address in found:
        print email_address

if __name__ == '__main__':

The trouble was i did not really understand what it was doing, and it was not quiet what i needed i had 30000+ files not just one! So this is where the real work began.

Step 1 - Was to write the email address out to a file instead of the screen. This it truns out is fairly simple using the FILE command: 

  • FILE = open(filename,'w') which opens / creates a file based on the variable "filename" in write mode.
  • FILE.write() - writes data to the file
  • FILE.close() - does what it says and closes a file so the data is written to disk
Step 2 - Select only the To: field in each email:
  • The original regex was nearly spot on i just made the following tweak re.compile(r'(To:\s+\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b)',re.IGNORECASE) The addition of: To:\s basically looks for To: in the document and \s is shorthand for "any white space"
The next bit caused the most headaches i had to take in a directory as a parameter, loop through the contents of it, check the contents to make sure it was a file, if it was read the contents. Then spit out any email addresses job done. So with only Google as my friend i set to work.

Step 3 - Grab a folder
  • Python has a handy function within the os module listdir() so i was able to pass the "file" now "folder" into my program. os.listdir(dirname)
Step 4 - Check to see if the contents is a file.
  • Again fairly straight forward: os.path.isfile()
With the above things were looking good, but i had come across a couple of issues for some reason my code was not passing the isfile() section. This i found was because the file path was not being passed in correctly so with a quick update: os.path.isfile(os.path.join(dirname, files)) I could now check each file (turns out the python os module is really quite useful). The next issue was my programme was going on and on and on. I had a looping issue it was so bad the file i was creating just kept getting bigger slowly eating all my disk space. Not good.

After doing a lot of reading i suddenly found out that a set() which i was writing all the data to was amazing. A set is an unordered collection with no duplicates! (wow no duplicates that solved an issue i had not even thought of!). The looping issue was caused because i had indented the write loop in the wrong place. I was writing all the email addresses out each pass through each file as the set got bigger more data was being written out each time over and over again. Turns out indents in python are very important.

So putting it altogether i ended up with the following:

=================== PYTHON SCRIPT EMAILS ==================
# June 13th, 2009 by Tuomas Rasila - with updates Matthew Brookes 2011
#!/usr/bin/env python # coding: utf-8 import os import re import sys def grab_email(dirname): #creates a file filename = "emails.txt" #Try and grab all emails addresses found within a given file. email_pattern = re.compile(r'(To:\s+\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b)',re.IGNORECASE) #A set is an unordered collection with no duplicate elements found = set() #opens file in write mode FILE = open(filename,'w') #Get a directory list for Xfiles in os.listdir(dirname): #Check if its a file if os.path.isfile(os.path.join(dirname, Xfiles)): #creates a path to the file so it can be read emails = os.path.join(dirname, Xfiles) # loop through each of the files and match email addresses, write these to the set. for line in open(emails, 'r'): found.update(email_pattern.findall(line)) # read the set of eamil addresses and write these out to the file created earlier. for email_address in found: FILE.write("update [table] set [column] = 0 where [column_value] like '"+email_address+"'\n") #Closes the file so data can be written FILE.close() if __name__ == '__main__': grab_email(sys.argv[1])

As you can see i even managed to write out the SQL i needed with each row in the file! The other thing is this is reusable and i can adapt it in the future, so a bit of up front work has hopefully saved me hours in the future.

Hopefully the above will help someone else out in the future as well.

Useful resources i used were:
Extracting email addresses from any text file with python
Python Documentation
An SEOS Guide To Regex
Not forgetting Google!

Wednesday, 18 May 2011

Social and Search?

This was originally a for a guest post on but it never made it live so here it is:

I have been using the web since about 1995 and since then a lot has changed. I created my first website in 1999 while at university and it was found by about 2 people apart from the people i told as i didn't understand about search engines. Between 2000 and 2007 search engines played the biggest part in the discovery of new content, that and email. But since 2007 social has slowly been building up and up until last year it exploded. I remember searching Google on the first day of the F1 championships, only to see a twitter stream right there in the centre of the results providing real time updates from around the web on what was happening. Or organising a holiday over Facebook with friends from across the country. The social web allows interaction and discovery on-line in ways not possible 10 or even 5 years ago.

So what does this mean and what are the possible relationships between search and social, should you think of them as being married walking hand in hand?

Well to start with having the best social strategy in the world and a poor site is going to get you no where as you need to be able to be found via multiple channels and the best way to ensure independence is to have your own site. Search engines are not going to be disappearing over night and still provide the majority of traffic to most sites. If your analytics package is showing people are not engaged with your website content are people going to be flocking to your Twitter / Facebook / LinkedIn page? Consider doing some basic Search Engine Optimisation (SEO) or Search Engine Marketing (SEM) and try to get an understanding who, where and when visitors come to you site. Once armed with some insight you can begin your social campaign!

Along with social comes brand, this is becoming more important with search and a lot of the recent changes in Google’s algorithms have favoured strong brands. How do search engines get a good idea of brand awareness? Social signals both Bing and Google have stated that they use services such as Twitter updates and Facebook likes / shares to gain understanding. Recently Google has furthered its attempt at social by introducing the +1 button which allows someone to +1 a search result at present and eventually +1 a web page via a button in the same way as a like / tweet. This will further enhance the social data Google presents to people and make “gaming” the rankings a little harder.

So do i think search and social are closely linked? The short answer is yes. The sites you like and share both on-line and off plant seeds in other peoples minds who then search for them. Social networks allow viral content to spread as well as provide targeted discovery of content through friends, business contacts and acquaintances interests. This in turn is used by search engines to help build a better picture of the web and the content to return in a search query.

Do you need to be on every social platform? No. You need to pick the right ones for your audience and ensure the content you share adds to your brand and existing on-line portfolio. You also need to be prepared to communicate with the people that follow you. Social is about understanding your audience and working with them to enhance knowledge and experience on and offline.

Having the right social strategy will provide you with a great opportunity not just in the social networks but also the search results and is something i would make sure was on my digital marketing list of activities.

Further Reading:

+1 Info :

Search and Social signals :

Understating social networks :