Taking hash(es) for good causes 
I am now at a point where I can start writing useful python scripts, but finding things to write scripts to do is something I find hard.

After speaking with Martin Hellwig and Alex Wilmer after a Python WM meeting, I found out there is an interesting set of applications that tell you if you have duplicated files on your computer. They are nothing exceptional just a program that indexes your file system, hashes all the files, then compare the hashes. Companies can charge quite a lot for these applications but I thought "I reckon I know enough python to do that myself!"

So here is the result of that.


#! /usr/bin/env python

''' A python program that walks a given directory to find files that are
duplicated. It then outputs the results to console (simply printing a
dictionary), and an output file.

command line parameters ./directoryhash_1.3.py [root directory] [outputfile]
'''

import os
import sys
import md5

hashes = {} # The "working" hashes dictionary
final = {} # The final dictionary with the all the duplicated files,
# with their hashes as keys

rootpath = sys.argv[1]
outputfile = open(sys.argv[2], "w")

def hashfunction(filetohash):
''' Takes a filetohash, hashses it with md5 checksum thingy, then checks to see if
that hash already exists. If not it adds it to a dictionary of files, where their
hash is the key value
'''
try:
openedfile = open(filetohash, "rb")
# print openedfile
filehash = md5.new(openedfile.read()).hexdigest()
# print filehash
if filehash not in hashes:
hashes[filehash] = [filetohash]
else:
hashes[filehash].append(filetohash)

except IOError:
pass
print "\n"
print filetohash
print "Probably a directory. Ignoring"

# The following section walks the directory from the rootpath.
# It then calls the hashing() function to do the checking etc.

for dirpath, directories, files in os.walk(rootpath):
for i in files:
filepath = dirpath + "/" + i
print filepath
try:
if os.path.getsize(filepath) < 157286400:
hashfunction(filepath)
else:
print filepath + "\n" + "Too big!"
continue
except OSError:
# Handles errors with the filenames, usually seems to be because
# of file locking etc. Not sure. Don't care.

print "BORK!"


# Checks the dictionary of hashes and discards all entries where
# there is only one file per hash. (ie the file is unique)

for j in hashes:
if len(hashes[j]) >= 2:
final[j] = hashes[j]


print 20 * "-"
print "All Files and Hashes"
print hashes

print "\n"

print 20 * "-"
print "Duplicated Files"
for k in final:
print final[k]

print "\n"

# Takes the final dictionary, and writes the output to a text
# file so its useful.

for l in final:
outputfile.write("hash: " + l + "\n")
for i in final[l]:
outputfile.write(i + "\n")

outputfile.write("-" * 20 + "\n\n")

outputfile.close()



It's crude looking and could probably do with some clean up and extra error catching, but it works. It outputs a text file with all the duplicate files that it found. Simple.

I find that the "openedfile = open(filetohash, "rb")" bit has the annoying habbit of printing out what it has opened, and I am unsure of how to change this. Any suggestions welcome.

It was suggested that I used the OpenSSL python library to do the hashing, but I couldn't get my head around it quick enough so bottled it and went with the standard libraries md5sum.



[ add comment ]   |  permalink
Making Movies Maw! 
Stop frame animation - Linux style

As part of my masters project I will need a way of constructing movies from single images.

I have been having some issues with ffmpeg, and totem. Totem has been screwing up files somehow....

Here is the command I am using.

ffmpeg -f image2 -r 10 -qscale 1 -i %03d.jpg i.avi

-r = frame rate

-qsacle = VBR encoding quality (1-31)

This outputs you a nice little movie. Play with the -qscale option to get the output quality you want.

This has taken me a week to do, thanks Totem and your bizarre treatment of media files.



[ add comment ] ( 2 views )   |  permalink
Micromapping Results 
The best bit about OSM is seeing the work you did magically appear on the OpenStreetMap website.

Here's two images, before and after of what I did Saturday morning.





Whats also great to see is how collaboration between individuals gets a lot more done.

Here is a video that I rendered from all the participants GPS tracks.



[ add comment ]   |  permalink  |  related link
OSM MicroMapping 
Andy Robinson organised an impromptu MicroMapping party of South Birmingham this morning. We were trying to make some progress on completing the blank areas that exist south west of Birmingham. The plan is to get Birmingham completed soon, and then make a big song and dance to get publicity.

To this end, Andy Robinson (blackadder) and BrianBoru have started moving down from north and east Birmingham to my neck of the woods. So I need to get a move on with mapping my little area of Harborne & Edgbaston. As Andy, BrianBoru and Xoff are encroaching into my territory! All progess will be posted here.

There were only a few of us attending, it was a serious mapping affair we met up at a McDonald's and chose the areas we were going to map. Meeting faces of people, namely Xoff, that you know did work nearby was incredibly fun. Talking about the nooks and cranies of your stomping ground, and making plans about what to do in the future.

I was on my bicycle, with help from Alex Wilmer and his car with bike rack, was dropped off somewhere near here

I had a few run ins with locals that were suspicious of me taking photographs and generally wandering around in a strange manner, talking to them placated and confused them, but they seemed happy to let me on my way. Quite a few less aggressive people though I was doing an art project, which made me smile. I can barely draw a car, though maybe the OSM data will be used in a artistic manner, then in some small way I would have helped with an art project.

One of the joys of mapping for OSM is the discovery of places that you would of otherwise never come across. Whilst recording the local of some footpaths I came across a large open park. Now the locals obviously know it's there, but I was very excited about finding such a beautiful park.





It's hard to think your in a city at times in Birmingham with these little parks hidden all over the place.

Andy has charged me with doing the "party renders" of the mapping party. This means making a static image of all the different traces recorded by the participants, and a movie that shows how the traces were collected over time. Both will be posted here in due course, once I have got all the traces.


[ add comment ]   |  permalink
Python + Gnuplot = 4 Days saved 
I've tried doing graph plotting in Linux before and found it to be a bit of a nightmare.

All I ever wanted to do was get data from a comma separated value (.csv) file and plot it.

I used Octave + Gnuplot, as Octave was designed for mathmaticians I thought it was the best tool for the job. Turns out being able to add numbers together does not give you the ability to use Octave.

Using Octave together with Gnuplot I was able to plot a graph in about 4 days. (Do not take this as a criticism of Octave, but of me.)

Thankfully inbetween times I began to learn Python and now the time has come again for me to try my hand at plotting graphs myself.

Looking in Ubuntu's repositories I found there was a Gnuplot package for python ready to install. After installing and trying the demo that came with it I managed to plot some graphs. Even better there is a csv module built into python itself.

All in all it took about an hour to get some graphs out. Whoop! Thanks Python you saved me 4 days of pulling my hair out.

Here's the code


#! /usr/bin/env python

import csv
import Gnuplot
import sys

data = sys.argv[1]
title = sys.argv[2]

results = []

f = open(data, "r")
reader = csv.reader(f)

g = Gnuplot.Gnuplot(debug=1)
g.title(title)
g('set data style lines')

for i in reader:
a = []
for j in i:
a.append(float(j))
results.append(a)

g.plot(results)
g.hardcopy(title+".ps", enhanced=1, color=1)



I plan on extending the file to do some data analysis for me (integration, linear regression etc), but that will require me learning to count a few more numbers.

Enjoy.



[ 5 comments ] ( 66 views )   |  permalink

<<First <Back | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | Next> Last>>