Presumably you'd need to calculate the length of the road, divide by the number of addresses, and create an offset point for the addresses according to this result.
This might be useful for some tasks, but to my mind this doesn't really help address the issue of randomness vs reality, since what you're creating is not a reliable representation.
For example, there might be a single high-rise building with 100 units. You could end up spreading these along the road segment, when they are a single cluster. Also, if you have a segment of road that isn't straight, then this is going to throw things out (as well as make the task of calculating the offset more difficult).
Perhaps the best way is to get commercial geocoded data - like post office address data. I believe that there is such data for both the UK and Canada (not sure where you are) and as an academic study the access to this info should be free of charge (which is good because the prices to purchase are astronomical). I would think this would be more reliable than geocoding via an online service.
As @awesomo alluded to, what you really want to query against is a layer containing the parcel boundaries. I don't know that there is an online service that can do batch queries like this for Oakland. Both the city of Oakland and Alameda county have map viewers where you may query individual parcels and return their information.
Here are the respective map viewers:
City of Oakland: Map Viewer
Alameda County: Assessor Parcel Maps
If, on the other hand, your intent is to do more large scale geocoding and further analysis, the situation is a bit more complicated. Ordinarily, I would say to simply download the parcel boundary layer from the appropriate agency, in this case, Alameda County's Geospatial Data Files.
You would then use that layer to set up a geocoding service in your favorite GIS software, and geocode based on addresses
.
In this case, you are going to run into a problem of what constitutes data that should be freely accessible to the public, or what is considered value-added, and thus able to be provided at an additional charge. This is a can of worms which I will avoid, but mention later.
Suffice it to say that Alameda County only provides the Parcel boundaries with a few attributes, namely the Parcel Number, and some modified dates for tracking, but no address or other property characteristics.
That sort of information is available from the county Assessor's office for an additional fee. Here is their Fee Schedule. You will want to look for the Compact Disc
, and either the Entire Assessment Roll (Secured & Unsecured)
, for $20, or the Entire Property Characteristic File
, which is $20,000. Please note the additional zeros for the second option. Hopefully they would include the addresses in the Assessment Roll, but that is by no means guaranteed.
Since this may turn into an expensive option, a second and completely free, though potentially less accurate option would be to instead use the street centerline file as the geocoding basis. This is available from the same download site at Alameda county.
Generally, the geocoders give you the option of including an offset
from the line to the proper side based on the address number. Doing this with some adjustment to get the correct distance, should drop the geocoded point inside the parcel boundary, thus letting you do a spatial join
between the point and the appropriate parcel boundary. If it doesn't fall inside, you could likely do a query for the nearest
point or parcel, depending on which way you go, and get a fairly high percentage match.
To briefly address the discussion of what is happening with GIS data and its place in the public domain, I point you to this article: Does Government GIS Data Belong to the People. This is but a brief discussion, but there is much more information out there, with opinions on both sides of the issue. That discussion, however, is not appropriate for this space.
Best Answer
Parsing of an address is a complicated process, as I'm sure you are well aware.
Using ZIP+4 data from the USPS, you can determine if a street exists within a given city/state/zip code. You can even verify that a primary number (house number) falls within the correct ZIP+4 range. Adjusting city names and street names to correct for spelling issues is also possible using spelling lists as well as "sounds like" matching. Taking an address and parsing it into the individual components and then comparing it against a database of known addresses is the only way to know that the parsing has been done correctly.
Knowing that an address fits within the assigned area and knowing that it is a real and deliverable address are distinct objectives. The first, address approximation, is something that Google Maps does very well. However, it is just that, approximation. Google Maps doesn't let you know if the address is actually deliverable, they show you where it would lie on the map if it were real. This is immensely valuable from a mapping standpoint and they have varying degrees of accuracy.
Certainly the USPS database has flaws (many of them) but they are also certainly more accurate and correct than any other single database of US addresses and it's that degree of accuracy (and the fact that it is constantly updated) that we rely on.
I also work at SmartyStreets and wanted to add to the conversation. If you need to validate fewer than 250 addresses per month, the API is free. If your organization is a nonprofit group, the service is completely free with no limits.