[GIS] Extracting data attributes from SpatialPolygonsDataframe and put into dataframe using R

overlayr

I want to extract all the data attributes from my SpatialPolygonsDataframe, including the ID of all polygons.

The reason I want to do this is the following. I have a SpatiaPolygonsDataframe (which I shall name countryPolygons) that consists of regions within a country, including the names of the regions as data attributes. I have another dataframe that consists of Individual IDs with their coordinates within that country (which I shall call individuals.df). My goal is to expand the dataframe with the individuals to include a column with the name of the region that they come from. So far, I did this:

country_and_individuals.over <- over(countryPolygons, individuals.df, returnList = TRUE)  
country_and_individuals.df <- ldply(country_and_individuals.over, data.frame)

This gave me the dataframe country_and_individuals.df that consists of the columns .id (which I assume is the ID of the polygons) and the IDs of the individuals that he got from the individuals.df dataframe:

> head(country_and_individuals.df)
    .id  individual_ID
 1   0   5473277
 2   0   3054526
 3   0   3476794
 4   0   4456345
 5   0   1930378
 6   0   1345628

I reckon that if I had the .id in combination with the names of the regions that are included in the SpatialPolygonsDataframe countryPolygons under the name "NAME_2", I could merge that to country_and_individuals.df, and my goal would be achieved. How do I extract a dataframe with the columns .id and NAME_2 from the SpatialPolygonsDataframe countryPolygons?

Perhaps there is an easier way to achieve this goal (a dataframe with the individual IDs as one column and the region that they are from based on their coordinates as another column).

Best Answer

Save the SpatialPolygons as a dataframe:

countryPolygons.df <- as.data.frame(countryPolygons)

Make a variable that contains rownumbers (which are the .id variable in the country_and_individuals.df in the opening post):

countryPolygons.df$.id <- as.numeric(rownames(countryPolygons.df))

Merge on the variable .id:

individuals_and_regions.df <- merge(country_and_individuals.df, countryPolygons.df, by.x = ".id", by.y = ".id")

Related Solutions

[GIS] Extracting values from rasters at location of points using R

Assuming that presencias and variables share the same projection, this should be an easy task. I recommend you to add these lines of code after your read.table() statement in order to convert presencias dataframe to a SpatialPointsDataFrame object (just refine the names of the columns containing x and y coordinates if they differ from my example).

coordinates(presencias) <- c("x", "y")

To provide a reproducible example, I try to open up the scope of my answer a little more. First of all, download and unzip this ESRI shapefile with more or less important locations in Germany. These will serve as point data later on. You will also need packages dismo, rgdal and raster for this short example, so make sure that these libraries (and all their dependencies) are installed on your local hard drive.

Let's start with loading the required packages.

library(dismo)
library(rgdal)
library(raster)

Next, you should generate a sample RasterLayer. In our case, we will make use of the gmap() function from the dismo package in order to obtain a physical map of Germany.

germany.mrc <- gmap("Germany")

You can now import your point shapefile via readOGR from R's rgdal package. Make sure to adjust the data source name (dsn = ...). The whole projection stuff is obsolete in your particular case. However, it has to be done in our example in order to successfully overlay our point data with the Germany RasterLayer.

# Import SpatialPointsDataFrame
germany.places <- readOGR(dsn = "/path/to/shapefile", layer = "places")
# Define shapefile's current CRS
projection(germany.places) <- CRS("+proj=lonlat +ellps=WGS84")
# Reproject to RasterLayer's CRS
germany.places.mrc <- spTransform(germany.places, CRS(projection(germany.mrc)))

To reduce the huge size of our point data, we will draw a random sample of ten locations in Germany. This should suffice for our purposes.

set.seed(35)
germany.places.mrc.sample <- germany.places.mrc[sample(nrow(germany.places.mrc), 10), ]

Now that the preparation stuff is finished, we could just start to extract the values of those particular pixels our ten randomly sampled points lie within.

data <- data.frame(coordinates(germany.places.mrc.sample),
                   germany.places.mrc.sample$name, 
                   extract(germany.mrc, germany.places.mrc.sample))
names(data) <- c("x", "y", "name", "value")

In order to merge the point coordinates with the extracted pixel values, we just need to set up a dataframe containing the coordinates of our SpatialPointsDataFrame. That's it!

data
           x       y          name value
1  1073490.3 6513446 Veitsteinbach   208
2  1269100.8 6156690   Assenhausen   231
3  1336757.5 6246284    Frauenwahl   195
4   828579.9 6634122      Altenhof   189
5  1571418.1 6662558         Wohla   151
6  1192299.4 6864087     Flechtorf   170
7   976270.0 6362050    Hilsenhain   208
8  1117416.4 6092146      Nestbaum   175
9  1274192.0 6344490 Wappeltshofen   236
10  878488.2 6839843        Leeden   208

Table Joins – How to Join a Table to a Shapefile with Non-Matching IDs and Names (Similar Strings)?

I would go for stringdist package which has implemented many algorithms to calculate the partial similarity (distance) of strings including Jaro-winkler. Here is a fast solution for you:

  #df to be joined
  id <- c(100:111)
  name <- c("Aragatsotn", "Ararat", "Armavir", "Gaghark'unik'", "Kotayk", "Lorri", 
            "Shirak", "Syunik'", "Tavush", "Vayots' Dzor", "Yerevan City","Aragatsotn")
  value <- runif(12, 0.0, 1.0)
  df <- data.frame(id, name, value)

  #create shape data df
  shpNames <- c("Aragatsotn",
               "Ararat",
               "Armavir",
               "Erevan",
               "Gegharkunik",
               "Kotayk",
               "Lori",
               "Shirak",
               "Syunik",
               "Tavush",
               "VayotsDzor")
  arm.data  <- data.frame(ID_1=1:11,NAME_1=shpNames)

  #simple match (only testing)
  match(df$name,arm.data$NAME_1)
  #simple merge (testing)
  merge(arm.data,df,by.x="NAME_1",by.y="name",all.x=TRUE)

  #partial match using stringdist package
  library("stringdist")
  am<-amatch(arm.data$NAME_1,df$name,maxDist = 3)
  b<-data.frame()
  for (i in 1:dim(arm.data)[1]) {
      b<-rbind(b,data.frame(arm.data[i,],df[am[i],]))
  }
  b

it outputs:

ID_1      NAME_1  id          name     value
1     1  Aragatsotn 100    Aragatsotn 0.8510984
2     2      Ararat 101        Ararat 0.3004329
3     3     Armavir 102       Armavir 0.9258740
4     4      Erevan  NA          <NA>        NA
5     5 Gegharkunik 103 Gaghark'unik' 0.9935353
6     6      Kotayk 104        Kotayk 0.6025050
7     7        Lori 105         Lorri 0.9577662
8     8      Shirak 106        Shirak 0.6346550
9     9      Syunik 107       Syunik' 0.6531175
10   10      Tavush 108        Tavush 0.9726032
11   11  VayotsDzor 109  Vayots' Dzor 0.3457315

You can play with maxDist parameter of amatch method. Although 3 works best with your sample data!

Best Answer

Related Solutions

[GIS] Extracting values from rasters at location of points using R

Table Joins – How to Join a Table to a Shapefile with Non-Matching IDs and Names (Similar Strings)?

Related Question