A raster outline occupies a rectangle bounded on the lower left by the origin, having coordinates (say) (x0, y0), and on the upper right by (x1, y1): these are properties easily discovered of any raster. You can use this information to move any points outside the raster's extent onto the nearest point on the raster's boundary and extract the raster values at the new locations.
Tasks like this are often readily worked out by considering lower dimensions. A one-dimensional raster's extent is just an interval of numbers, say from x0 to x1 > x0, conventionally written [x0,x1]. Suppose you want to find the nearest point in this interval to a given number x. There are three cases: x < x0 (x is to the left), x0 <= x <= x1 (x is in the interval), and x1 < x (x is to the right). Clearly the nearest points are x0, x, and x1, respectively. A formula for this result is
x --> min(max(x,x0),x1),
as you can see by contemplating each of the three cases separately.
Because the extent of a raster is the Cartesian product of two intervals, [x0,x1] * [y0,y1], the same formula works in two dimensions. Just apply it to each interval. Thus the projected point's coordinates can be computed (in a Field Calculator, say) as
(x,y) --> ( min(max(x,x0),x1), min(max(y,y0),y1 ).
Here is this formula in action with 100 points randomly placed around and within a grid's extent:
The dashed lines visually associate points outside the extent to the locations where they will be moved.
Create a point layer from these computed coordinates and apply the Sample
tool to extract the raster values. Join the results back to the original point layer.
As a practical matter, to avoid problems with floating point roundoff that might keep the new points from lying within the grid's extent, it would be wise to make x0 and y0 slightly bigger than the true origin (you might add half the cellsize to them) and, similarly, make x1 and y1 slightly smaller.
In summary, the workflow consists of calculating two fields (the new coordinates) in the original point layer's table, creating a point event layer from those new coordinates, running the Sample
tool, and performing a database join (not a spatial join).
When you have many more vector points than there are cells in the raster, you should prefer the solution given by @celenius (who proposes converting the raster to points and using a spatial join). Normally, though, rasters have lots of cells--millions to billions--and the conversion to points is so time-consuming and disk-consuming that it should be approached with caution.
Alternatively, you could expedite Celenius' solution by using a raster calculation to make an integer grid with values only along its boundary. Converting that to a point layer is quick and easy because it will have few points. Let's call this the "boundary layer." Sample the grid at the boundary layer points, thereby copying the grid values into the boundary layer's attribute table. After running Sample
with the original point layer, remove all points where a sample was successfully obtained. Spatially join the boundary layer to the remaining points to complete the sampling process.
(One way to make a grid with values only along its boundary is to apply a focal neighborhood operation that fails along the boundary: examples include finding slopes, hillshading, 3 x 3 neighborhood focal stats. This creates a grid with a one-cell-thick ring of NoData around its edge. Using IsNull and SetNull to detect NoData cells and convert between NoData and data cells produces a grid full of NoData values except around the boundary.)
Following @Dango's idea I created and tested (on small rasters with the same extent and cell size) the following code:
import arcpy, numpy
inRaster = r"C:\tmp\RastersArray.gdb\InRaster"
inRaster2 = r"C:\tmp\RastersArray.gdb\InRaster2"
##Get properties of the input raster
inRasterDesc = arcpy.Describe(inRaster)
#coordinates of the lower left corner
rasXmin = inRasterDesc.Extent.Xmin
rasYmin = inRasterDesc.Extent.Ymin
# Cell size, raster size
rasMeanCellHeight = inRasterDesc.MeanCellHeight
rasMeanCellWidth = inRasterDesc.MeanCellWidth
rasHeight = inRasterDesc.Height
rasWidth = inRasterDesc.Width
##Calculate coordinates basing on raster properties
#create numpy array of coordinates of cell centroids
def rasCentrX(rasHeight, rasWidth):
coordX = rasXmin + (0.5*rasMeanCellWidth + rasWidth)
return coordX
inRasterCoordX = numpy.fromfunction(rasCentrX, (rasHeight,rasWidth)) #numpy array of X coord
def rasCentrY(rasHeight, rasWidth):
coordY = rasYmin + (0.5*rasMeanCellHeight + rasHeight)
return coordY
inRasterCoordY = numpy.fromfunction(rasCentrY, (rasHeight,rasWidth)) #numpy array of Y coord
#combine arrays of coordinates (although array for Y is before X, dstack produces [X, Y] pairs)
inRasterCoordinates = numpy.dstack((inRasterCoordY,inRasterCoordX))
##Raster conversion to NumPy Array
#create NumPy array from input rasters
inRasterArrayTopLeft = arcpy.RasterToNumPyArray(inRaster)
inRasterArrayTopLeft2 = arcpy.RasterToNumPyArray(inRaster2)
#flip array upside down - then lower left corner cells has the same index as cells in coordinates array
inRasterArray = numpy.flipud(inRasterArrayTopLeft)
inRasterArray2 = numpy.flipud(inRasterArrayTopLeft2)
# combine coordinates and value
inRasterFullArray = numpy.dstack((inRasterCoordinates, inRasterArray.T))
#add values from second raster
rasterValuesArray = numpy.dstack((inRasterFullArray, inRasterArray2.T))
Based on @hmfly code, you can have access to desired values:
(height, width, dim )=rasterValuesArray.shape
for row in range(0,height):
for col in range(0,width):
#now you have access to single array of values for one cell location
Unfortunately there's one 'but' - the code is right for NumPy arrays which can be handled by system memory. For my system (8GB), the largest array was about 9000,9000.
As my experience doesn't let me provide more help, you can consider some suggestions about dealing wiht large arrays:
https://stackoverflow.com/questions/1053928/python-numpy-very-large-matrices
arcpy.RasterToNumPyArray
method allows to specify the subset of raster converted to NumPy array (ArcGIS10 help page) what can be useful when chunking large dataset into submatrices.
Best Answer
Using any of the tools you mentioned you can preprocess the points so that one feature (with a count) occurs in each raster cell. Converting that to a raster using the standard built-in method completes the task.
The advantage of this method is that it provides complete control over the conversion process, so you know exactly what is happening. Moreover, it easily generalizes: with only trivial modifications (at the summarize step) you can produce rasters whose values are any statistical summary of the points falling within each cell (of which the count is a simple example).
The preprocessing is relatively simple: you need to identify the cell in which any point (x,y) falls. The raster's extent is given by an origin (Ox, Oy) and corresponding cell sizes (h, k) (usually h=k). The (column, row) coordinates of (x,y)'s cell are
That requires two quick calculations to create new shapefile attributes i and j. Summarize the attribute table on the concatenation of i and j (which will identify the cells), retaining the total count (or any other summary) for each unique (i,j) pair, along with the values of i and j themselves. Convert the (i,j) attributes in the summary table back to coordinates of cell centers as
This produces a table with three attributes: x, y, and the summary value for the points corresponding to (x,y)'s cell. Proceed in any way you prefer to convert this into a raster.