Firstly, what you have is a regression problem - how (if at all) does X depend on Y.
Secondly, you have a Poisson regression problem - X is a count, and fairly small count at that (0,1,2,3).
Thirdly you have two covariates or explanatory variables - population density and distance to commercial area.
Fourthly your data is spatial, so you might not assume that they are independent. This affects the inference from the parameter estimates (generally the standard errors will be smaller than they should be).
So, initially you should plug all those into a generalised linear model using the glm
function. Then map the residuals at the data points and see if they look spatially correlated.
The autocorrelation of covariates is not the problem in itself. What may be a problem is if there is correlation of covariates at the position of data observations. In that case, there will be identifiability issues to estimate the parameters of your SDM model.
What people usually do is to test for correlation between covariates at observation points. When two covariates are correlated, you can combine with PCA like you plan to do or choose one of the two covariates (with biological a priori sense).
What I do is to test a model with the first covariate, then the second one and then both together. But the test is a k-fold cross-validation so that useless additional covariates will not be retained, hence if the correlation between covariates is too high, only the best one is retained. But if there is a little information coming from both of them, you may want to keep both of them. Even if there is correlation. Environmental covariates always shows correlation because temperature is linked to altitude, because rain is linked to wind, etc...
Even if you have high resolution covariates, I would recommend to disaggregate them and test combination of all covariates, both with high resolution and lower resolution. Resolution will capture different scale effects and you may be surprised by the outputs.
By the way, the coincidence (but no correlation...) is that I just released a R-package on github that may help you to do that. I present it on my website and the "SDM_Selection" vignette will show you my own way of doing SDM and covariates selection:
https://statnmap.com/sdmselect-package-species-distribution-modelling/
Edit
The model will be built on your point dataset. If there are identifiability or multicollinearity issues, it will be because of your dataset, not the external data you do not use for modeling.
You can see it the other way: There may be some environmental data completely not related/correlated, but your sampling plan is biased. e.g. imagine each time you observed in the forest it was a rainy month and each time you went for observations in the swamp it was a sunny month, hence there will be correlation between cover type and rain monthly rate. At the same latitude and altitude, let's say there is few possibility of general correlation between these two covariates, but in your dataset, these two will be correlated. Similarly, you may find correlation between global covariates but not in your dataset because your sampling plan counterbalanced the correlation. Because the correlation is not 100%, this means there are chances for a combination of sampling position for which there is no correlation in the covariates.
Thus, you need to verify this correlation inside your dataset.
There is a recent blog post about multicollinearity:
https://datascienceplus.com/multicollinearity-in-r/
Best Answer
Eventually, with the help of Serge Rey's answer and this link in the documentation, I ended up using pysal's implementation as follows:
In the last line I reshape the array of pseudo p_values (which indicate the significance of the local indicators of spatial association (LISAs) ) for visualization purposes:
I just leave it here as an answer in case someone finds it useful in the future.