[GIS] Extracting all address locations from street centerline file

arcgis-desktopgeocodingmapinfo

I have a project in which I want to "extract" all of the geocoded addresses from a street centerline file (with my defined offset and whatever other user defined input is needed). If it is helpful here is a description of what type of format a street centerline file is in.

Currently the only way I know how to do this would be a brute force geocode (i.e. make all possible options and geocode the points that way). Which isn't an impossible option but I would like to see if a more efficient means exists.

I mostly use ESRI software (I have access to machines running 9.2 and 9.3) but if an easier solution exists in some freeware package I'm open to anything. I also have access to MapInfo through my university which I would be willing to use if a simple solution exists. Also all of the street centerline files are in .shp format so any other program will have to either use the shapefile format or have the ability for me to convert the shapefile to a useable format (I assume a trivial step).

I have a very little bit of programming experience in Python (and more extensive experience with statistical software but that won't really help out here). So if I had code options I suppose an ArcScript with a GUI or Python would be preferred.

Also my "universe" of locations I am creating will include street intersections, so if anyone can point to a solution to extract all of the street intersections in the street centerline file (and record both the intersecting streets) I would appreciate this as well. I assume this is more likely to already exist in some ArcScript.

Edit:
To address DavidF's concerns, I know what information is contained in a street centerline file, and I know they are not actual and verified street addresses. For this project I am not making my own geocoder, I am simply defining the universe of address locations according to the street centerline file itself (I also know this universe will be dependent on how far I decide to offset the address locations).

The immediate goal of this question is simply to define a coordinate for all potential addresses. I'm assuming all potential addresses are integers between the minimum and maximum values for the centerline. For this project it is ok that they are not real or verified.

For alittle bit more insight into the project and its goals, often with inferential spatial statistics distributions are created via creating random locations within your study boundary and calculating distributions from those random samples. Often times the universe of potential locations is dictated by residential addresses (i.e. a potential location has to be defined by whatever geocoder you are using, it can't be everywhere within the study boundary). Since these residential addresses are likely clustered in space, I want to see if functions like Ripley's K tend to be biased in using complete spatial randomness to define the sampling distribution as opposed to using the actual locations of addresses (defined apriori by whatever you are using to geocode those addresses).

I know that if I use a parcel based geocoder that pretty much all of these concerns are circumvented. I am still interested in extending the analysis to street centerline files though because as far as I'm aware street centerline files are still regularly used to geocode addresses, and so such a question is still pertinent. I plan on conducting the analysis with a parcel based geocoding system as well.

As far as DavidF's and Mark Ireland's concern about the "realness" of the addresses if I use every integer value, it is a legitmate concern given the nature of the project. If I use more addresses than really exist, it will induce artificial clustering of addresses. I would likely conduct post hoc specificity tests using Both David's suggestion (addresses only every 40 feet by a sampling strategy) and Mark's suggestion (verifying real addresses from some outside source). In either case I still want to be able to extract all the address points from the street centerline file. I am not verifying the accuracy of the geocoder from the onset, I am seeing if complete spatial randomness conforms to the distribution of points that a geocoding process could potentially give. I understand the errors associated with street centerline files, but I want to mimic how geocoding is actually conducted with a street centerline file.

Also if people can point to some literature that they think would be pertinent I would appreciate it. I know of a bit of work done on the accuracy of geocoding, but I have not seen any work addressing this particular issue.

Best Answer

Presumably you'd need to calculate the length of the road, divide by the number of addresses, and create an offset point for the addresses according to this result.

This might be useful for some tasks, but to my mind this doesn't really help address the issue of randomness vs reality, since what you're creating is not a reliable representation.

For example, there might be a single high-rise building with 100 units. You could end up spreading these along the road segment, when they are a single cluster. Also, if you have a segment of road that isn't straight, then this is going to throw things out (as well as make the task of calculating the offset more difficult).

Perhaps the best way is to get commercial geocoded data - like post office address data. I believe that there is such data for both the UK and Canada (not sure where you are) and as an academic study the access to this info should be free of charge (which is good because the prices to purchase are astronomical). I would think this would be more reliable than geocoding via an online service.