Assuming that presencias
and variables
share the same projection, this should be an easy task. I recommend you to add these lines of code after your read.table()
statement in order to convert presencias
dataframe to a SpatialPointsDataFrame object (just refine the names of the columns containing x and y coordinates if they differ from my example).
coordinates(presencias) <- c("x", "y")
To provide a reproducible example, I try to open up the scope of my answer a little more.
First of all, download and unzip this ESRI shapefile with more or less important locations in Germany. These will serve as point data later on. You will also need packages dismo
, rgdal
and raster
for this short example, so make sure that these libraries (and all their dependencies) are installed on your local hard drive.
Let's start with loading the required packages.
library(dismo)
library(rgdal)
library(raster)
Next, you should generate a sample RasterLayer. In our case, we will make use of the gmap()
function from the dismo
package in order to obtain a physical map of Germany.
germany.mrc <- gmap("Germany")
You can now import your point shapefile via readOGR
from R's rgdal
package. Make sure to adjust the data source name (dsn = ...). The whole projection stuff is obsolete in your particular case. However, it has to be done in our example in order to successfully overlay our point data with the Germany RasterLayer.
# Import SpatialPointsDataFrame
germany.places <- readOGR(dsn = "/path/to/shapefile", layer = "places")
# Define shapefile's current CRS
projection(germany.places) <- CRS("+proj=lonlat +ellps=WGS84")
# Reproject to RasterLayer's CRS
germany.places.mrc <- spTransform(germany.places, CRS(projection(germany.mrc)))
To reduce the huge size of our point data, we will draw a random sample of ten locations in Germany. This should suffice for our purposes.
set.seed(35)
germany.places.mrc.sample <- germany.places.mrc[sample(nrow(germany.places.mrc), 10), ]
Now that the preparation stuff is finished, we could just start to extract the values of those particular pixels our ten randomly sampled points lie within.
data <- data.frame(coordinates(germany.places.mrc.sample),
germany.places.mrc.sample$name,
extract(germany.mrc, germany.places.mrc.sample))
names(data) <- c("x", "y", "name", "value")
In order to merge the point coordinates with the extracted pixel values, we just need to set up a dataframe containing the coordinates of our SpatialPointsDataFrame. That's it!
data
x y name value
1 1073490.3 6513446 Veitsteinbach 208
2 1269100.8 6156690 Assenhausen 231
3 1336757.5 6246284 Frauenwahl 195
4 828579.9 6634122 Altenhof 189
5 1571418.1 6662558 Wohla 151
6 1192299.4 6864087 Flechtorf 170
7 976270.0 6362050 Hilsenhain 208
8 1117416.4 6092146 Nestbaum 175
9 1274192.0 6344490 Wappeltshofen 236
10 878488.2 6839843 Leeden 208
I would go for stringdist
package which has implemented many algorithms to calculate the partial similarity (distance) of strings including Jaro-winkler
.
Here is a fast solution for you:
#df to be joined
id <- c(100:111)
name <- c("Aragatsotn", "Ararat", "Armavir", "Gaghark'unik'", "Kotayk", "Lorri",
"Shirak", "Syunik'", "Tavush", "Vayots' Dzor", "Yerevan City","Aragatsotn")
value <- runif(12, 0.0, 1.0)
df <- data.frame(id, name, value)
#create shape data df
shpNames <- c("Aragatsotn",
"Ararat",
"Armavir",
"Erevan",
"Gegharkunik",
"Kotayk",
"Lori",
"Shirak",
"Syunik",
"Tavush",
"VayotsDzor")
arm.data <- data.frame(ID_1=1:11,NAME_1=shpNames)
#simple match (only testing)
match(df$name,arm.data$NAME_1)
#simple merge (testing)
merge(arm.data,df,by.x="NAME_1",by.y="name",all.x=TRUE)
#partial match using stringdist package
library("stringdist")
am<-amatch(arm.data$NAME_1,df$name,maxDist = 3)
b<-data.frame()
for (i in 1:dim(arm.data)[1]) {
b<-rbind(b,data.frame(arm.data[i,],df[am[i],]))
}
b
it outputs:
ID_1 NAME_1 id name value
1 1 Aragatsotn 100 Aragatsotn 0.8510984
2 2 Ararat 101 Ararat 0.3004329
3 3 Armavir 102 Armavir 0.9258740
4 4 Erevan NA <NA> NA
5 5 Gegharkunik 103 Gaghark'unik' 0.9935353
6 6 Kotayk 104 Kotayk 0.6025050
7 7 Lori 105 Lorri 0.9577662
8 8 Shirak 106 Shirak 0.6346550
9 9 Syunik 107 Syunik' 0.6531175
10 10 Tavush 108 Tavush 0.9726032
11 11 VayotsDzor 109 Vayots' Dzor 0.3457315
You can play with maxDist parameter of amatch method. Although 3 works best with your sample data!
Best Answer
Save the SpatialPolygons as a dataframe:
Make a variable that contains rownumbers (which are the .id variable in the country_and_individuals.df in the opening post):
Merge on the variable .id: