Monday, 5 September 2011

Playing with Python: The hunt for email addresses


Working in both customer services and as an SEO often presents a complex mix of tasks to do and challenges.

One such challenge was to extract the TO: addresses from over 30000+ plus emails. Not wanting to interrupt the dev team and it being an interesting task i decided to tackle it.


I haven't programmed properly in a few years well near on 10 so i turned to Google and a language i have an interest in Python. After a bit of searching i hit upon the following code by Tumas Rasila, this provide a great starting point as it covers the basics really well, that being reading a file and extracting email addresses.


http://rasilagarage.com/2009/06/extracting-email-addresses-from-any-text-file-with-python/


#!/usr/bin/env python
# coding: utf-8

import os
import re
import sys

def grab_email(file):
    """Try and grab all emails addresses found within a given file."""
    email_pattern = re.compile(r'\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b',re.IGNORECASE)
    found = set()
    if os.path.isfile(file):
        for line in open(file, 'r'):
            found.update(email_pattern.findall(line))
    for email_address in found:
        print email_address

if __name__ == '__main__':
    grab_email(sys.argv[1])

The trouble was i did not really understand what it was doing, and it was not quiet what i needed i had 30000+ files not just one! So this is where the real work began.


Step 1 - Was to write the email address out to a file instead of the screen. This it truns out is fairly simple using the FILE command: 

  • FILE = open(filename,'w') which opens / creates a file based on the variable "filename" in write mode.
  • FILE.write() - writes data to the file
  • FILE.close() - does what it says and closes a file so the data is written to disk
Step 2 - Select only the To: field in each email:
  • The original regex was nearly spot on i just made the following tweak re.compile(r'(To:\s+\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b)',re.IGNORECASE) The addition of: To:\s basically looks for To: in the document and \s is shorthand for "any white space"
The next bit caused the most headaches i had to take in a directory as a parameter, loop through the contents of it, check the contents to make sure it was a file, if it was read the contents. Then spit out any email addresses job done. So with only Google as my friend i set to work.

Step 3 - Grab a folder
  • Python has a handy function within the os module listdir() so i was able to pass the "file" now "folder" into my program. os.listdir(dirname)
Step 4 - Check to see if the contents is a file.
  • Again fairly straight forward: os.path.isfile()
With the above things were looking good, but i had come across a couple of issues for some reason my code was not passing the isfile() section. This i found was because the file path was not being passed in correctly so with a quick update: os.path.isfile(os.path.join(dirname, files)) I could now check each file (turns out the python os module is really quite useful). The next issue was my programme was going on and on and on. I had a looping issue it was so bad the file i was creating just kept getting bigger slowly eating all my disk space. Not good.

After doing a lot of reading i suddenly found out that a set() which i was writing all the data to was amazing. A set is an unordered collection with no duplicates! (wow no duplicates that solved an issue i had not even thought of!). The looping issue was caused because i had indented the write loop in the wrong place. I was writing all the email addresses out each pass through each file as the set got bigger more data was being written out each time over and over again. Turns out indents in python are very important.

So putting it altogether i ended up with the following:


=================== PYTHON SCRIPT EMAILS ==================
# June 13th, 2009 by Tuomas Rasila - with updates Matthew Brookes 2011
#!/usr/bin/env python # coding: utf-8 import os import re import sys def grab_email(dirname): #creates a file filename = "emails.txt" #Try and grab all emails addresses found within a given file. email_pattern = re.compile(r'(To:\s+\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b)',re.IGNORECASE) #A set is an unordered collection with no duplicate elements found = set() #opens file in write mode FILE = open(filename,'w') #Get a directory list for Xfiles in os.listdir(dirname): #Check if its a file if os.path.isfile(os.path.join(dirname, Xfiles)): #creates a path to the file so it can be read emails = os.path.join(dirname, Xfiles) # loop through each of the files and match email addresses, write these to the set. for line in open(emails, 'r'): found.update(email_pattern.findall(line)) # read the set of eamil addresses and write these out to the file created earlier. for email_address in found: FILE.write("update [table] set [column] = 0 where [column_value] like '"+email_address+"'\n") #Closes the file so data can be written FILE.close() if __name__ == '__main__': grab_email(sys.argv[1])


As you can see i even managed to write out the SQL i needed with each row in the file! The other thing is this is reusable and i can adapt it in the future, so a bit of up front work has hopefully saved me hours in the future.


Hopefully the above will help someone else out in the future as well.


Useful resources i used were:
Extracting email addresses from any text file with python
Python Documentation
An SEOS Guide To Regex
Not forgetting Google!

Wednesday, 18 May 2011

Social and Search?


This was originally a for a guest post on http://www.narvi.co.uk/ but it never made it live so here it is:

I have been using the web since about 1995 and since then a lot has changed. I created my first website in 1999 while at university and it was found by about 2 people apart from the people i told as i didn't understand about search engines. Between 2000 and 2007 search engines played the biggest part in the discovery of new content, that and email. But since 2007 social has slowly been building up and up until last year it exploded. I remember searching Google on the first day of the F1 championships, only to see a twitter stream right there in the centre of the results providing real time updates from around the web on what was happening. Or organising a holiday over Facebook with friends from across the country. The social web allows interaction and discovery on-line in ways not possible 10 or even 5 years ago.

So what does this mean and what are the possible relationships between search and social, should you think of them as being married walking hand in hand?

Well to start with having the best social strategy in the world and a poor site is going to get you no where as you need to be able to be found via multiple channels and the best way to ensure independence is to have your own site. Search engines are not going to be disappearing over night and still provide the majority of traffic to most sites. If your analytics package is showing people are not engaged with your website content are people going to be flocking to your Twitter / Facebook / LinkedIn page? Consider doing some basic Search Engine Optimisation (SEO) or Search Engine Marketing (SEM) and try to get an understanding who, where and when visitors come to you site. Once armed with some insight you can begin your social campaign!

Along with social comes brand, this is becoming more important with search and a lot of the recent changes in Google’s algorithms have favoured strong brands. How do search engines get a good idea of brand awareness? Social signals both Bing and Google have stated that they use services such as Twitter updates and Facebook likes / shares to gain understanding. Recently Google has furthered its attempt at social by introducing the +1 button which allows someone to +1 a search result at present and eventually +1 a web page via a button in the same way as a like / tweet. This will further enhance the social data Google presents to people and make “gaming” the rankings a little harder.

So do i think search and social are closely linked? The short answer is yes. The sites you like and share both on-line and off plant seeds in other peoples minds who then search for them. Social networks allow viral content to spread as well as provide targeted discovery of content through friends, business contacts and acquaintances interests. This in turn is used by search engines to help build a better picture of the web and the content to return in a search query.

Do you need to be on every social platform? No. You need to pick the right ones for your audience and ensure the content you share adds to your brand and existing on-line portfolio. You also need to be prepared to communicate with the people that follow you. Social is about understanding your audience and working with them to enhance knowledge and experience on and offline.

Having the right social strategy will provide you with a great opportunity not just in the social networks but also the search results and is something i would make sure was on my digital marketing list of activities.

Further Reading:

+1 Info :
http://googleblog.blogspot.com/2011/03/1s-right-recommendations-right-when-you.html
http://www.seomoz.org/blog/google-1-and-the-rise-of-social-seo

Search and Social signals :
http://searchengineland.com/what-social-signals-do-google-bing-really-count-55389

Understating social networks :
http://www.slideshare.net/padday/the-real-life-social-network-v2

Monday, 18 April 2011

What's the Waze? Social Navigation

With the advent of social networking and smart phones with global positioning satellites (GPS) capabilities, a unique opportunity opened up with the rather boring world of mapping and navigation.

Which_way by Matthew Brookes

I have always enjoyed hiking and camping so from a very early age have been able to read a OS or road map - this is a very useful skill as technology does not always have the answer. Still for day to day stuff life should be easier.

Back in 2007 my then Nokia N95 not only did it have a great camera for a phone but also a GPS which when combined with the rather poor mapping software could sort of tell me where i was heading. It also had some software for tracking when i went out walking or cycling. This was the first time I had used any personal GPS software other than when my Dad used his TomTom in the car, and i thought it was great even if a bit clunky.

The appearance of on-line tools like Google Maps also meant you no longer had to think how long should it take to get from A- B or which route should i take, a quick search and there was the result (now with traffic info) which cold be printed out and used on your next journey, I sort of feel sorry for the AA route planner which up until Google Maps was the best directions your could get ( some would argue still is).

Fast forward a couple of years and armed with a nice new Iphone 3G with proper web browser and Apps! a whole new ball game was under way, first i used the built in Google maps software this was fun and a massive improvement on the Nokia i could even plan a route and the GPS could track me along the road, ace!

The real problem i had though was no voice commands and you are not allowed to be driving fiddling with your phone so voice was a major feature missing for me, it was also around this time i found out about the Open Street Map project. This is a crowd sourced mapping project which on looking back at today, is fantastic providing a easy way to include mapping information with your apps or teach people about mapping. The project introduced me to crowd sourcing and the social aspect to mapping here were hundreds if not thousands of people around the world contributing on a daily basis to improve everyone's understanding of the places they lived in as well as useful mapping information.

After using Open Street Map for a while and realising their API could be used for a mapping application i stumbled across Skobbler. This app did exactly what i was after provide routes to destinations with voice commands, bingo! After using the app for a while i started thinking with all this mapping data at Open Street Map and Skobbler using it to provide me with directions wouldn't be good if while using the App i could provide data back to them so as to improve things??

Well as luck would have it one of my friends (@AlexBuchta) brought to my attention Waze.

Waze is the social way to user generated navigation. Unlike Skobbler when using Waze you are building the map as you travel! In some ways this is a little scary you would think a navigation app was supposed to already know the way. However the more you travel the the better the maps get the more people that use the app the better the app gets and the best bit is the gaming aspect.

Waze allows you to set-up an account with them and link it into Facebook and or Twitter if your friends are using it you can see how many points they have achieved. Points are awarded for all sorts of things mapping new roads, using the app multiple days, fixing mapping issues and so on. However what i liked the best were the cup cakes yum yum. As you drive using the app you unlocked certain achievements:


As you build up more points you get to customise your character a little as well so you can pick a different type of vehicle or select a mood based on how you are feeling that day (generally hungry I find). If you start using the App in an area not know by Waze no problem you can plough the roads as you dive. Its fun when you suddenly find out someone else is using the App and all of a sudden two parts of the map join up its at this point the App gets more useful as the work you have been doing helps all those other Wazers out there.

As you can see from the screen shots Waze is international as well so you take it on your travels but don't forget your international phone tariff!

So once you have you local area planned out and a few long journeys under your belt what else does the app do?

It can give you voice directions on your journey, travel certain routes regularly it learns this and offers up guidance on your journey, the first time the app did this was a little scary and I was a bit disappointed at being so predictable! As its a community based App you have the ability to report various different things on your travels. Be this traffic jams, accidents, police etc. It even allows you to take a photo and upload this. All this info goes back to some super type computer I guess and with the help of an algorithm it means Waze can alert other travellers to potential problems, this gets you from A- B in the quickest time!

They provide different goodies depending on the time of year and certain special occasions such as Easter, Valentines or Christmas which keeps things interesting as you travel around its not too often you spot a Easter egg or present sat in you path and these give you bonus points.

If you live in an area where there are lots of people using Waze you can join groups this means you can get updates from the people around you which also means the traffic info is really relevant and up to date.

So far Waze has been the best Free App I have found for car navigation. It might not have all the roads pre-mapped out but that's part of the fun, the simple points system keeps things interesting, but the best bit is the community and they way the more people us it the better the maps, directions get.

A couple of things I would like to see in the App are:
  • points of interest - by this I mean services on the road it would be great to get a head up on petrol stations or the next place I can find a toilet.
  • In App ads - little odd but often when travelling it would be good to be able to get a deal on something, not to many mind you!!
I would also like to see a time lapsed video of the UK Waze map forming, I think this would be really interesting as you could try and spot which parts of the country were early adopters. 

Have fun Wazing and check out the Waze site: http://www.waze.com