Geocoding – How to Scrape Addresses from a Web Page for Geocoding Purposes

geocoding

I am trying to get addresses from a web page for mapping purposes. I know I could copy and paste the addresses into an Excel file and geocode them but I was wondering if there was any faster way to get the address from a web page and create a point location file?

Best Answer

This should get you started. Python and the BeautifulSoup module to the rescue. The code below will print out a list of the 26 addresses on that webpage. I used Firebug in Firefox to look at the page source, which told me that the cell width was 37%. I gambled that those cells were maybe the only ones at 37% width, and was right. You should be able to feed the list of addresses you get into a online geocoder and get point locations.

enter image description here

import BeautifulSoup as bs
import urllib2

url = 'http://www.phillypal.com/pal_locations.php'

response = urllib2.urlopen(url)
html = response.read()
soup = bs.BeautifulSoup(html)

addresses = soup.findAll('td', {'width':'37%'})

print len(addresses)

for address in addresses:
    print address.find(text=True)
Related Question