Firstly, what you have is a regression problem - how (if at all) does X depend on Y.
Secondly, you have a Poisson regression problem - X is a count, and fairly small count at that (0,1,2,3).
Thirdly you have two covariates or explanatory variables - population density and distance to commercial area.
Fourthly your data is spatial, so you might not assume that they are independent. This affects the inference from the parameter estimates (generally the standard errors will be smaller than they should be).
So, initially you should plug all those into a generalised linear model using the glm
function. Then map the residuals at the data points and see if they look spatially correlated.
If I understand you correctly, you want to test sample variation. That is to say, how well your sample distribution matches your population (raster) distribution. This is not a correlative relationship and is commonly done by comparing the sample mean and variance against the population.
Here is an example where I calculate mean, variance and quantiles for a random sample and the raster. I also plot the sample and population distributions.
First, for an example, we create a raster and take a random sample. I am simulating a very skewed distribution for the raster to add potential bias to the sample variation.
require(raster)
sigma = 0.6; mu = 2
r <- raster(ncol=500, nrow=500)
r[] <- dlnorm(seq(-10, 60, length.out = ncell(r)), mu, sigma) * 100
n <- 20
s <- sampleRandom(r, n, na.rm=TRUE, sp=TRUE)
s@data <- data.frame(s@data, r=extract(r, s))
head(s@data)
plot(r)
plot(s, pch=20, add=TRUE, col="black")
We can now calculate distributional moments for the sample and population.
# raster (population) mean, varaince and quantiles
( rmean <- cellStats(r, stat="mean", na.rm=TRUE, asSample=FALSE) )
( rvar <- cellStats(r, stat="sd", na.rm=TRUE, asSample=FALSE)^2 )
( rquant <- quantile(r, probs = c(0.25, 0.50, 0.75)) )
# random points (sample) mean, varaince and quantiles
( smean <- mean(s@data[,"r"]) )
( svar <- var(s@data[,"r"]) )
( squant <- quantile(s@data[,"r"], probs = c(0.25, 0.50, 0.75)) )
We can also plot the population and sample distributions. To test the null hypothesis that x and y were drawn from the same continuous distribution, by comparing distributions, we could perform a nonparametric two-sample Kolmogorov-Smirnov test using the ks.test function.
# Plot sample and population distributions
par(mfrow=c(2,1))
density(r, xlim=c(cellStats(r, "min"),cellStats(r, "max")),
main="Population distribution")
plot(density(s@data[,"r"]), xlim=c(cellStats(r, "min"),cellStats(r, "max")),
main="sample distribution")
A two-sample student t-test can easily be calculated by coercing the raster into a vector however, this is not memory safe and can be dangerous. I would recommend taking a very large random sample to represent the population.
r <- as.vector(as.matrix(r))
r <- na.omit(r)
t.test(s@data[,"r"], y = r)
Because it sounds like you have a binominal response, you really should test the sample variation in an ANOVA or MANOVA framework so you can make sure that you have not only captured the population variation but also that the problem is balanced (variation equally split between [0,1]).
You may want to seriously consider a Bootstrap or Monte Carlo approach to identify bias. Since you are using a density raster, bandwidth and Kernel distribution can radically effect the estimates. You can quantify this uncertainty with these two methods.
Best Answer
For your point data on asthma instances and rainfall (as long as it is not interpolated raster data) you can look at the spatial cross-correlation following Chen(2015) & Anselin(1995). Here is a simple example.
Add libraries and data
In the
crossCorrelation
function, as a starting point I would recommend using the default spatial weights matrix method, by passing plainer coordinates to the coords argument, as it tracks well with the equations presented in Chen(2015).The default spatial weights matrix (Wij) method is "inverse power" but, there is also a "negative exponent" as an alternative weights function. If you omit the y argument the result is a the univariate local and global autocorealtion. However, if y is also specified, the result represents the cross-correlation, Anselin(1995) defined these statistic(s) as local indicators of spatial association (LISA) in both the univariate and bivariate case(s).
I would recommend caution when interpreting bivariate cross-correlations (defined as correlation between one variable and the spatial lag of another variable). As, too much emphasis can be put on the spatial relationship, while ignoring the inherent correlation structure of the processes (Lee 2001).
References
Anselin, L. (1995) Local indicators of spatial association. Geographical Analysis. 27:93–115.
Chen., Y. (2015) A New Methodology of Spatial Cross-Correlation Analysis. PLoS One 10(5):e0126158. doi:10.1371/journal.pone.0126158.
Lee, S. (2001) Developing a Bivariate Spatial Association Measure: An Integration of Pearson’s r and Moran’s I. Journal of Geographical Systems 3:369–85.