Assuming that presencias
and variables
share the same projection, this should be an easy task. I recommend you to add these lines of code after your read.table()
statement in order to convert presencias
dataframe to a SpatialPointsDataFrame object (just refine the names of the columns containing x and y coordinates if they differ from my example).
coordinates(presencias) <- c("x", "y")
To provide a reproducible example, I try to open up the scope of my answer a little more.
First of all, download and unzip this ESRI shapefile with more or less important locations in Germany. These will serve as point data later on. You will also need packages dismo
, rgdal
and raster
for this short example, so make sure that these libraries (and all their dependencies) are installed on your local hard drive.
Let's start with loading the required packages.
library(dismo)
library(rgdal)
library(raster)
Next, you should generate a sample RasterLayer. In our case, we will make use of the gmap()
function from the dismo
package in order to obtain a physical map of Germany.
germany.mrc <- gmap("Germany")
You can now import your point shapefile via readOGR
from R's rgdal
package. Make sure to adjust the data source name (dsn = ...). The whole projection stuff is obsolete in your particular case. However, it has to be done in our example in order to successfully overlay our point data with the Germany RasterLayer.
# Import SpatialPointsDataFrame
germany.places <- readOGR(dsn = "/path/to/shapefile", layer = "places")
# Define shapefile's current CRS
projection(germany.places) <- CRS("+proj=lonlat +ellps=WGS84")
# Reproject to RasterLayer's CRS
germany.places.mrc <- spTransform(germany.places, CRS(projection(germany.mrc)))
To reduce the huge size of our point data, we will draw a random sample of ten locations in Germany. This should suffice for our purposes.
set.seed(35)
germany.places.mrc.sample <- germany.places.mrc[sample(nrow(germany.places.mrc), 10), ]
Now that the preparation stuff is finished, we could just start to extract the values of those particular pixels our ten randomly sampled points lie within.
data <- data.frame(coordinates(germany.places.mrc.sample),
germany.places.mrc.sample$name,
extract(germany.mrc, germany.places.mrc.sample))
names(data) <- c("x", "y", "name", "value")
In order to merge the point coordinates with the extracted pixel values, we just need to set up a dataframe containing the coordinates of our SpatialPointsDataFrame. That's it!
data
x y name value
1 1073490.3 6513446 Veitsteinbach 208
2 1269100.8 6156690 Assenhausen 231
3 1336757.5 6246284 Frauenwahl 195
4 828579.9 6634122 Altenhof 189
5 1571418.1 6662558 Wohla 151
6 1192299.4 6864087 Flechtorf 170
7 976270.0 6362050 Hilsenhain 208
8 1117416.4 6092146 Nestbaum 175
9 1274192.0 6344490 Wappeltshofen 236
10 878488.2 6839843 Leeden 208
If I understand you correctly I think you can solve it like this (their are comments in the code that explain what is going on):
import numpy, arcpy, random
#Establish the extent which your random samples can be within
rangeX = (100, 2500000) # Enter the actual range in x values of your rasters * 100 in order to get coordinates with decimals
rangeY = (100, 2500000) # Enter the actual range in y values of your rasters * 100 in order to get coordinates with decimals
qty = 1000 # Enter in the number greater than random points you need
#Generate random x,y coordinates
randPoints = []
while len(randPoints) < qty:
x = random.randrange(*rangeX)/100.0 # divide by 100.0 to be able to get coordinates with decimal values
y = random.randrange(*rangeY)/100.0 # divide by 100.0 to be able to get coordinates with decimal values
randPoints.append((x,y))
#Create dictionary of key and lists, list will house tuples of (x,y,z)
#Enter in actual classified values for dictionary keys
valueDict = {'Class1' : [],
'Class2' : [],
'Class3' : [],
'Class4' : []}
######Get Rasters bands as well as cell height, width, origin info to be able to get
######index of x,y location in the numpy array
arcpy.env.workspace = inPath + '\\aster.img'
bands = arcpy.ListRasters()
Ras = arcpy.Raster(inPath + '\\aster.img')
originX = Ras.extent.upperLeft.X
originY = Ras.extent.upperLeft.Y
pixelWidth = Ras.meanCellWidth
pixelHeight = Ras.meanCellHeight
#Create a list that houses each raster array
bandsList = []
for i in bands:
bandsList.append(arcpy.RasterToNumPyArray(i).astype(numpy.float32))
#loop over all of the random point locations and collect raster values at their
#locations if the dictionary entry for that value is not full populate it
#with a tuple of (x,y,z), keep going until each class is full
for i in randPoints:
X = i[0]
Y = i[1]
xOffset = int((X-originX)/pixelWidth)
yOffset = int(abs(Y-originY)/pixelHeight)
for j in range(0,len(bands)):
sampleValue = bandsList[j][yOffset, xOffset]
for key in valueDict.keys():
if sampleValue == key:
if len(valueDict[key]) < 10:
valueDict[key].append((X, Y, sampleValue))
break
else:
continue
This is a variation of a script that I have used to extract raster values at random x,y locations, so it may need some tweaking but I think the major elements are their to get the job done for you.
Best Answer
You could try the SAGA GIS feature 'Export Grid to XYZ'. Despite the name this also supports multiband images (e.g. you could export a 'rendered GeoTiff' to get XYZRGBA, as well as getting XYZ from a 1-band DEM). You can also choose whether or not to drop NODATA rows.
I don't have any massive rasters to hand to test this, but you could try that to see if the memory usage and performance is more acceptable than gdal2xyz.
Failing that, if you need to write your own code, a python library to look at is RasterIO. This is a wrapper around GDAL/Numpy and provides a higher level interface. It also allows the windowed-reading you refer to.