[GIS] Correlation between rasters using R

rraster

In R, anyone know a method for measuring correlation between raster and spatialpixels, or between two rasters?. I read something about Kulldorff but do not understand how to use it, and maybe it's not the solution.

Ok, I will improve the question. I would like to compare a set of points representing where certain people live, and the number of times people have suffered thefts. In some cases it has not been stolen, in other cases once stolen and in other cases, two or three. For each person the exposure period is one year. I would like to compare these data against a raster population density created by european environment agency, also against a raster containing the proximity in meters to a commercial area. how i can measure the correlation between points and rasters?. At the end I have to study the correlation of a set of data (SpatialPointsDataFrame) against various rasters.

i know classical techniques of inference, and it's not geostadistics are in principle but I wonder if there is something more advanced. I do not understand the correlation with population density where exactly is the point, I would like to understand for example the density (mean, maximum, minimum, percentile … etc) in a circle. If I do this by iterative methods can reach the same conclusion, creating rasters that soften the density of population, with formula. i know the focal methods of r raster package. But what I want to know is if there is a method that estimates the circle. A person does not move just right next to your home, it has a radius of action in which he lives. That's the circle I want to know it's like a cluster

Best Answer

Firstly, what you have is a regression problem - how (if at all) does X depend on Y.

Secondly, you have a Poisson regression problem - X is a count, and fairly small count at that (0,1,2,3).

Thirdly you have two covariates or explanatory variables - population density and distance to commercial area.

Fourthly your data is spatial, so you might not assume that they are independent. This affects the inference from the parameter estimates (generally the standard errors will be smaller than they should be).

So, initially you should plug all those into a generalised linear model using the glm function. Then map the residuals at the data points and see if they look spatially correlated.

Related Solutions

[GIS] Finding correlation between point location and raster value

If I understand you correctly, you want to test sample variation. That is to say, how well your sample distribution matches your population (raster) distribution. This is not a correlative relationship and is commonly done by comparing the sample mean and variance against the population.

Here is an example where I calculate mean, variance and quantiles for a random sample and the raster. I also plot the sample and population distributions.

First, for an example, we create a raster and take a random sample. I am simulating a very skewed distribution for the raster to add potential bias to the sample variation.

require(raster)
  sigma = 0.6; mu = 2
r <- raster(ncol=500, nrow=500)
  r[] <- dlnorm(seq(-10, 60, length.out = ncell(r)), mu, sigma) * 100

n <- 20 
  s <- sampleRandom(r, n, na.rm=TRUE, sp=TRUE) 
    s@data <- data.frame(s@data, r=extract(r, s))
      head(s@data)

plot(r)
  plot(s, pch=20, add=TRUE, col="black")

We can now calculate distributional moments for the sample and population.

# raster (population) mean, varaince and quantiles 
( rmean <- cellStats(r, stat="mean", na.rm=TRUE, asSample=FALSE) )  
( rvar <- cellStats(r, stat="sd", na.rm=TRUE, asSample=FALSE)^2 )     
( rquant <- quantile(r, probs = c(0.25, 0.50, 0.75)) )

# random points (sample) mean, varaince and quantiles 
( smean <- mean(s@data[,"r"]) )
( svar <- var(s@data[,"r"]) )
( squant <- quantile(s@data[,"r"], probs = c(0.25, 0.50, 0.75)) )

We can also plot the population and sample distributions. To test the null hypothesis that x and y were drawn from the same continuous distribution, by comparing distributions, we could perform a nonparametric two-sample Kolmogorov-Smirnov test using the ks.test function.

# Plot sample and population distributions
par(mfrow=c(2,1))
  density(r, xlim=c(cellStats(r, "min"),cellStats(r, "max")), 
          main="Population distribution")
  plot(density(s@data[,"r"]), xlim=c(cellStats(r, "min"),cellStats(r, "max")),
       main="sample distribution")

A two-sample student t-test can easily be calculated by coercing the raster into a vector however, this is not memory safe and can be dangerous. I would recommend taking a very large random sample to represent the population.

r <- as.vector(as.matrix(r))
  r <- na.omit(r)      
    t.test(s@data[,"r"], y = r)

Because it sounds like you have a binominal response, you really should test the sample variation in an ANOVA or MANOVA framework so you can make sure that you have not only captured the population variation but also that the problem is balanced (variation equally split between [0,1]).

You may want to seriously consider a Bootstrap or Monte Carlo approach to identify bias. Since you are using a density raster, bandwidth and Kernel distribution can radically effect the estimates. You can quantify this uncertainty with these two methods.

[GIS] Pixel correlations from two raster datasets in R

For reference, here is an approach, in memory, that uses mapply. This is one of the lesser known apply functions in R that lets you apply a function across objects.

Here, when I create objects from the raster stacks, I wrap the stack function call in list and values. This results in list objects, containing data.frames holding the raster values. This makes the data ready for mapply, which expects list objects. The use of sapply allows me to use a numeric row index 1:nrow(s1[[1]]) to aggregate the row-by-row correlations. This results in a vector of correlations, matching the rows in each lists data.frames. This vector is ordered to the cells in the source rasters and can be piped directly back into a source raster.

library(raster)
r <- raster(system.file("external/test.grd", package="raster"))
  s1 <- list(values( stack(r, r*0.15, r/1805.78) ))
  s2 <- list(values( stack(r^2, r/0.87*10, r/sum(values(r),na.rm=T)) ))

r[] <- as.vector(mapply(function(X,Y) { sapply(1:nrow(s1[[1]]), function(row) 
               cor(X[row,], Y[row,]))}, X=s1, Y=s2))
plot(r)

Best Answer

Related Solutions

[GIS] Finding correlation between point location and raster value

[GIS] Pixel correlations from two raster datasets in R

Related Question