Scraping OSM with Python 
The OpenStreetMap API lets you do lots of things with the OSM data, like uploading and downloading GPX traces. Unfortunately when you download GPX traces the data/time stamp has been removed. If you download GPS traces individually from the public gps list, then you get the raw original (I think) GPS data with date/time stamps.

Now since Birmingham is complete. I wanted to generate the party render for the entire city over the last 2 years. I neede a way to download 170+ traces. I certainly wasn't doing this by hand. What follows is my most hacky python script yet, it will parse a page and look for the GPS trace IDs, then construct a URL to download. You have to change the page to scrape manually, but hey I wrote it in 30mins what you do expect?


#! /usr/bin/env python

'''Python GPS downloaded for OSM'''

import sgmllib

class MyParser(sgmllib.SGMLParser):
"A simple parser class."

def parse(self, s):
"Parse the given string 's'."
self.feed(s)
self.close()

def __init__(self, verbose=0):
"Initialise an object, passing 'verbose' to the superclass."

sgmllib.SGMLParser.__init__(self, verbose)
self.hyperlinks = []

def start_a(self, attributes):
"Process a hyperlink and its 'attributes'."

for name, value in attributes:
if name == "href":
self.hyperlinks.append(value)

def get_hyperlinks(self):
"Return the list of hyperlinks."

return self.hyperlinks

import urllib, sgmllib

# Get something to work with.
f = urllib.urlopen("http://www.openstreetmap.org/traces/tag/Birmingham/page/9")
s = f.read()

# Try and process the page.
# The class should have been defined first, remember.
myparser = MyParser()
myparser.parse(s)

# Get the hyperlinks.
links = myparser.get_hyperlinks()
working = []
final = []
traces = []
nonduplicates = []

for i in links:
b = i.find("user")
print i, b
if b > 0:
working.append(i)

for i in working:
b = i.find("traces")
print i, b
if b > 0:
final.append(i)

for i in final:
parts = i.split("/")
lastitem = len(parts)-1
traces.append(parts[lastitem])

for i in traces:
if i not in nonduplicates:
nonduplicates.append(i)

for i in nonduplicates:
url = "http://www.openstreetmap.org/trace/" + i + "/data"
print url
trace = urllib.urlopen(url)
localfile = open(i, "w")
localfile.write(trace.read())
trace.close()
localfile.close()



Credit - the HTML parseing class was written by Boddie.

I plan on tidying this up (a lot), as it is extremely useful, until the OSM API catches up anyway.



[ add comment ]   |  permalink

<<First <Back | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | Next> Last>>