A Code Dojo is meant to be a group learning exercise to learn new techniques and improve your old ones. I have neither so I should be able to gain great benefit from attending. The format of the learning can vary although I believe traditional Dojo's are meant to practice one particular technique over and over again.
This Dojo was organised by Nicholas Tollervey and others (sorry I can't remember everyone's names and I don't know who did what) and took the format of writing a Tic-Tac-Toe game. Only one person would write code at a time, the pilot, and a second co-pilot could sit next to them and make suggestions. However there was a third co-pilot which was everyone else in the room who could see the code you were writing on a massive projector. Hence, geeks being geeks, what you wrote was both wrong, correct, absurd and brilliant. Each person had 10 minutes to write code, or pass a particular test whichever came first.
There were about 20-30 people attending, and first off we all got stuck into the beer and pizza. Which was delicious. Soon after the social formalities were out of the way we were beckoned to the Dojo area to begin.
A lot of group discussion was involved on very developer orientated topics, when to write tests, what tests to write, what game are we playing, how should we check for this, that or the other. Honestly a lot went over my head, but I got the overall theme of the discussions and managed to learn one of two things about testing code and Test Driven Development
My Python-fu is not as developed as most of the attendees however I can see a geek-gag when I see it. My one and only contribution to the code base worth mentioning was the print statement that would be displayed when the code ran.
"Do you want to play a game?"
Which got enough laughs for me to be happy. I can't honestly remember what other code I wrote, but it didn't work!
During my turn the suggestions from the other attendees would come at a phenomenal pace, however I can the calm Marcus at my side making clearer suggestions than the "wisdom of crowds" in front of me.
To encourage participation and prevent "wall flowers" there was a prize draw to every participant who coded at the front. You name was put in a hat and you got the chance of winning your choice of O'Reily book (Kindly donated by Josette Garcia from O'Reily of international conference fame), IronPython in Action (signed by the authors) or some management twaddle book.
I got pick out of the hat first so I am the proud ownwer of the Python Cookbook!
After the Dojo quite a few of us went to the Doggett's Coat and Bade for a drink, which left me feeling worse for wear this morning, so much that I was a little late for work. Getting to know the other participants made me feel like I've arrived in London and starting to integrate.
Nicholas Tollervey and company did a tremendous job organising the Dojo and it would not have go so smoothly without their prior effort. Nicholas explained how they are going to be adapting the Dojo as they host them and made a raft of changes since the last one. I can only see the events improving as the experience of the crew grows.
I'm certainly looking forward to the next event which will be in a few weeks.
[ 1 comment ] ( 16 views ) | permalink
The OpenStreetMap API lets you do lots of things with the OSM data, like uploading and downloading GPX traces. Unfortunately when you download GPX traces the data/time stamp has been removed. If you download GPS traces individually from the public gps list, then you get the raw original (I think) GPS data with date/time stamps.
Now since Birmingham is complete. I wanted to generate the party render for the entire city over the last 2 years. I neede a way to download 170+ traces. I certainly wasn't doing this by hand. What follows is my most hacky python script yet, it will parse a page and look for the GPS trace IDs, then construct a URL to download. You have to change the page to scrape manually, but hey I wrote it in 30mins what you do expect?
#! /usr/bin/env python
'''Python GPS downloaded for OSM'''
import sgmllib
class MyParser(sgmllib.SGMLParser):
"A simple parser class."
def parse(self, s):
"Parse the given string 's'."
self.feed(s)
self.close()
def __init__(self, verbose=0):
"Initialise an object, passing 'verbose' to the superclass."
sgmllib.SGMLParser.__init__(self, verbose)
self.hyperlinks = []
def start_a(self, attributes):
"Process a hyperlink and its 'attributes'."
for name, value in attributes:
if name == "href":
self.hyperlinks.append(value)
def get_hyperlinks(self):
"Return the list of hyperlinks."
return self.hyperlinks
import urllib, sgmllib
# Get something to work with.
f = urllib.urlopen("http://www.openstreetmap.org/traces/tag/Birmingham/page/9")
s = f.read()
# Try and process the page.
# The class should have been defined first, remember.
myparser = MyParser()
myparser.parse(s)
# Get the hyperlinks.
links = myparser.get_hyperlinks()
working = []
final = []
traces = []
nonduplicates = []
for i in links:
b = i.find("user")
print i, b
if b > 0:
working.append(i)
for i in working:
b = i.find("traces")
print i, b
if b > 0:
final.append(i)
for i in final:
parts = i.split("/")
lastitem = len(parts)-1
traces.append(parts[lastitem])
for i in traces:
if i not in nonduplicates:
nonduplicates.append(i)
for i in nonduplicates:
url = "http://www.openstreetmap.org/trace/" + i + "/data"
print url
trace = urllib.urlopen(url)
localfile = open(i, "w")
localfile.write(trace.read())
trace.close()
localfile.close()
Credit - the HTML parseing class was written by Boddie.
I plan on tidying this up (a lot), as it is extremely useful, until the OSM API catches up anyway.
[ add comment ] | permalink
Noticed a few problems with the script, firstly the fact I hard-coded the maximum file size into the script.It's now an option on the command line.
Also added some extra file detection stuff. The the file is zero bytes big it now doesn't bother adding it to the hash list. But it is recorded in the output file. Same for files bigger than the maximum.
#! /usr/bin/env python
''' A python program that walks a given directory to find files that are
duplicated. It then outputs the results to console (simply printing a
dictionary), and an output file.
command line parameters
./directoryhash_1.4.py [root directory] [outputfile] [max file size in bytes]
'''
import os
import sys
import md5
hashes = {} # The "working" hashes dictionary
final = {} # The final dictionary with the all the duplicated files,
zerobytes = [] # with their hashes as keys. A list of files with zero bytes.
toobig = [] # Files that were too big.
rootpath = sys.argv[1]
outputfile = open(sys.argv[2], "w")
maxfile = sys.argv[3]
maxfile = long(maxfile)
def hashfunction(filetohash):
''' Takes a filetohash, hashses it with md5 checksum thingy, then checks to see if
that hash already exists. If not it adds it to a dictionary of files, where their
hash is the key value
'''
try:
openedfile = open(filetohash, "rb")
# print openedfile
filehash = md5.new(openedfile.read()).hexdigest()
# print filehash
if filehash not in hashes:
hashes[filehash] = [filetohash]
else:
hashes[filehash].append(filetohash)
except IOError:
pass
print "\n"
print filetohash
print "Probably a directory. Ignoring"
# The following section walks the directory from the rootpath.
# It then calles the hashing() function to do the checking etc.
for dirpath, directories, files in os.walk(rootpath):
for i in files:
filepath = dirpath + "/" + i
print filepath
try:
if os.path.getsize(filepath) > maxfile:
print filepath + "\n" + "Too big!"
toobig.append(filepath)
elif os.path.getsize(filepath) < maxfile and os.path.getsize(filepath) > 0:
hashfunction(filepath)
elif os.path.getsize(filepath) <= 0:
zerobytes.append(filepath)
except OSError:
# Handles errors with the filenames, usually seems to be because
# of file locking etc. Not sure. Don't care.
print "BORK!"
# Checks the dictionary of hashes and discards all entries where
# there is only one file per hash. (ie the file is unique)
for j in hashes:
if len(hashes[j]) >= 2:
final[j] = hashes[j]
print "\n"
# Takes the final dictionary, and writes the ouput to a text
# file so its useful.
if len(final) > 0:
print "Duplicates found \nCheck output file \n" + "-" * 20
for l in final:
outputfile.write("hash: " + l + "\n")
for i in final[l]:
outputfile.write(i + "\n")
outputfile.write("-" * 20 + "\n\n")
else:
print "No duplicates found! \n" + "-" * 20
outputfile.write("No Duplicates found!\n\n")
if len(zerobytes) > 0:
outputfile.write("Empty files \n" + "-" * 20 + "\n")
for m in zerobytes:
outputfile.write(m + "\n")
outputfile.write("-" * 20 + "\n\n")
if len(toobig) > 0:
outputfile.write("Files bigger than " + str(maxfile) + " bytes" + "\n" + "-" * 20 + "\n")
for m in toobig:
outputfile.write(m + "\n")
outputfile.write("-" * 20 + "\n\n")
outputfile.close()
Enjoy!
[ add comment ] | permalink
I am now at a point where I can start writing useful python scripts, but finding things to write scripts to do is something I find hard.
After speaking with Martin Hellwig and Alex Wilmer after a Python WM meeting, I found out there is an interesting set of applications that tell you if you have duplicated files on your computer. They are nothing exceptional just a program that indexes your file system, hashes all the files, then compare the hashes. Companies can charge quite a lot for these applications but I thought "I reckon I know enough python to do that myself!"
So here is the result of that.
#! /usr/bin/env python
''' A python program that walks a given directory to find files that are
duplicated. It then outputs the results to console (simply printing a
dictionary), and an output file.
command line parameters ./directoryhash_1.3.py [root directory] [outputfile]
'''
import os
import sys
import md5
hashes = {} # The "working" hashes dictionary
final = {} # The final dictionary with the all the duplicated files,
# with their hashes as keys
rootpath = sys.argv[1]
outputfile = open(sys.argv[2], "w")
def hashfunction(filetohash):
''' Takes a filetohash, hashses it with md5 checksum thingy, then checks to see if
that hash already exists. If not it adds it to a dictionary of files, where their
hash is the key value
'''
try:
openedfile = open(filetohash, "rb")
# print openedfile
filehash = md5.new(openedfile.read()).hexdigest()
# print filehash
if filehash not in hashes:
hashes[filehash] = [filetohash]
else:
hashes[filehash].append(filetohash)
except IOError:
pass
print "\n"
print filetohash
print "Probably a directory. Ignoring"
# The following section walks the directory from the rootpath.
# It then calls the hashing() function to do the checking etc.
for dirpath, directories, files in os.walk(rootpath):
for i in files:
filepath = dirpath + "/" + i
print filepath
try:
if os.path.getsize(filepath) < 157286400:
hashfunction(filepath)
else:
print filepath + "\n" + "Too big!"
continue
except OSError:
# Handles errors with the filenames, usually seems to be because
# of file locking etc. Not sure. Don't care.
print "BORK!"
# Checks the dictionary of hashes and discards all entries where
# there is only one file per hash. (ie the file is unique)
for j in hashes:
if len(hashes[j]) >= 2:
final[j] = hashes[j]
print 20 * "-"
print "All Files and Hashes"
print hashes
print "\n"
print 20 * "-"
print "Duplicated Files"
for k in final:
print final[k]
print "\n"
# Takes the final dictionary, and writes the output to a text
# file so its useful.
for l in final:
outputfile.write("hash: " + l + "\n")
for i in final[l]:
outputfile.write(i + "\n")
outputfile.write("-" * 20 + "\n\n")
outputfile.close()
It's crude looking and could probably do with some clean up and extra error catching, but it works. It outputs a text file with all the duplicate files that it found. Simple.
I find that the "openedfile = open(filetohash, "rb")" bit has the annoying habbit of printing out what it has opened, and I am unsure of how to change this. Any suggestions welcome.
It was suggested that I used the OpenSSL python library to do the hashing, but I couldn't get my head around it quick enough so bottled it and went with the standard libraries md5sum.
[ add comment ] | permalink
I've tried doing graph plotting in Linux before and found it to be a bit of a nightmare.
All I ever wanted to do was get data from a comma separated value (.csv) file and plot it.
I used Octave + Gnuplot, as Octave was designed for mathmaticians I thought it was the best tool for the job. Turns out being able to add numbers together does not give you the ability to use Octave.
Using Octave together with Gnuplot I was able to plot a graph in about 4 days. (Do not take this as a criticism of Octave, but of me.)
Thankfully inbetween times I began to learn Python and now the time has come again for me to try my hand at plotting graphs myself.
Looking in Ubuntu's repositories I found there was a Gnuplot package for python ready to install. After installing and trying the demo that came with it I managed to plot some graphs. Even better there is a csv module built into python itself.
All in all it took about an hour to get some graphs out. Whoop! Thanks Python you saved me 4 days of pulling my hair out.
Here's the code
#! /usr/bin/env python
import csv
import Gnuplot
import sys
data = sys.argv[1]
title = sys.argv[2]
results = []
f = open(data, "r")
reader = csv.reader(f)
g = Gnuplot.Gnuplot(debug=1)
g.title(title)
g('set data style lines')
for i in reader:
a = []
for j in i:
a.append(float(j))
results.append(a)
g.plot(results)
g.hardcopy(title+".ps", enhanced=1, color=1)
I plan on extending the file to do some data analysis for me (integration, linear regression etc), but that will require me learning to count a few more numbers.
Enjoy.
[ 5 comments ] ( 66 views ) | permalink

Calendar



