LiDAR Point Clouds in R – Visualizing Vertical Profiles

forest-ecologylidarprofilervisualisation

I would like to visualize the vertical profiles of different LiDAR point clouds in different forest types. I have extracted the z-values from the LiDAR files with lidR and put them in a dataframe with 20 columns for each plot and ca. 200.000 rows.

I would like to create a jitter histogram to illustrate the differences between different forest types and their structure. However, I feel like I could do with less than 200.000 points per column.

Any suggestions how to reduce the points in a way that it preserves the overall trend?

I am not sure if I am free to share the data I am working on publicly, but since laz files are the same in their architecture, I guess it is still understandable if you have one laz file at hand.

Sample of my code:

# Import the required libraries
library(lidR)
library(data.table)
library(tidyverse)
library(rlas)

> #create list                             
las.list<-list.files(path = "~/LAZ", pattern ="*.laz", full.names=T,
> recursive=FALSE,  include.dirs = FALSE)

zlist=list()
##loop to extract z values from each LAZ file#########
for(i in las.list){
  lidar<-readLAS(i)
  lidarsingles <- lasfilterduplicates(lidar)
  z<-lidarsingles@data$Z
  zlist <- cbind(zlist,z)
 }

#add plot names
zdf<-as.data.frame(zlist)
plotnames<-c("AEW02", "AEW03", "AEW11", "AEW12", "AEW14", "AEW16", "AEW21" ,"AEW32", "AEW36" ,"AEW44" ,"HEW08" ,"HEW20" ,"HEW22", "HEW48", "SEW18" ,"SEW21", "SEW31" ,"SEW35",
             "SEW40", "SEW43")
id<-rownames(zdf)
colnames(zdf)<-plotnames
czdf<-cbind(id, zdf)

Up to this point, this gives me a data.frame where each column represents one plot and the data in each column are the height values of each point in the plot.

I would like to come up with a plot like this for each plot:

By now, I know that the amount of points might not be ultimately hindering, however, I struggling with arranging the data in a way where I have two variables that I can plot against like so:

# Boxplots of mpg by number of gears
# observations (points) are overlayed and jittered
qplot(plot, height, data=zdf, geom=c("boxplot", "jitter"),
   fill=plot, main="Height Distribution of Returns",
   xlab="Plotname", ylab="Height")

Best Answer

Instead of organizing data where each plot is in a different column, do it as a dataframe with all Z values in one column and an additional index column with the plot ID. For example:

plot_id  Z
1        10.3
1        11.4
2        12.8
2        7.6
3        24.5
...      ...

One will probably have trouble loading all data in R memory, and some options to try are:

Thin the point cloud. See: Thinning large LiDAR point cloud?. This answers your question about "reducing points in a way that preserves the overall trend". And/Or
Use the filter option when loading data in R with readLAS function. With this option one can also random thin the point cloud (for example, -keep_every_nth 2, -keep_random_fraction 0.1, etc.); but also keep or drop specific parts of the point cloud (for example, drop ground points -drop_class 2, keep vegetation points -keep_class 3 4 5, etc.). See rlas:::lasfilterusage() for all filter options available. And/Or
Take advantage of catalog from lidR package. It will allow processing files sequentially (in chunks), using multiple cores and directly outputting/writing results to files without returning them into R's memory (among other features).

Assuming those are overcome, let's focus on the following objective:

I would like to visualize the vertical profiles of different LiDAR point clouds in different forest types.

I don't think jittered points (plotted with or without boxplots) is a good option for visualizing distribution of highly dense data, even using some transparency among points. Usually, the point cloud will look like a solid geometry block in the chart. Here is a post in Stack Overflow about jittering points to deal with overlapping points.

Boxplots alone are ok, but I prefer using histograms and probability distributions (aka probability density functions - pdf) for representing vertical profiles.

Here is a reproducible example using LiDAR datasets from the lidR package. In this example, every dataset is considered to be a plot.

First let's prepare the dataset as suggested in the beginning of this post:

#-------------------------------------
#Build object of class data.frame with two columns:
   # - plot_id, and 
   # - normalized Z values from LiDAR LAS files.

library(lidR)

las.list <- list.files(path = "...\\R\\win-library\\3.5\\lidR\\extdata",
                       pattern = "*.laz",
                       full.names = TRUE)

> las.list
[1] "...\\R\\win-library\\3.5\\lidR\\extdata/Megaplot.laz"    
[2] "...\\R\\win-library\\3.5\\lidR\\extdata/MixedConifer.laz"
[3] "...\\R\\win-library\\3.5\\lidR\\extdata/Topography.laz"  

df <- data.frame(plot_id = NA,
                 Z = NA)

for (i in las.list) {
  las <- readLAS(i)
  las <- lasnormalize(las, tin()) #normalize (changes elevations to heights)
  Z <- data.frame(Z = las@data$Z)
  plot_id <- data.frame(plot_id = rep.int(which(las.list == i), nrow(Z)))
  las_frame <- cbind(plot_id, Z)
  df <- rbind(df, c(las_frame))
}

df <- df[-1,]
rm(i, las.list, las, Z, plot_id, las_frame)
head(df)

Now, let's take a look at histograms showing the distribution of normalized Z values (heights):

#-------------------------------------
#Exploratory analysis of heights distribution.

library(ggplot2)

min(df$Z) #-0.661
max(df$Z) #32.017

ggplot(df, aes(x = Z)) +
  geom_vline(xintercept = 1, color = 'red', linetype = 2) +
  geom_histogram(breaks = seq(-1, 33, 2),
                 binwidth = 2,
                 aes(y = ..count..),
                 colour = "black",
                 fill = "white") +
  scale_y_continuous("Count of returns", limits = c(0, 20000)) + 
  scale_x_continuous("Height (m)", limits = c(-1, 33)) +
  facet_wrap(~plot_id, ncol = 3, nrow = 1) +
  theme_bw() + 
  theme(strip.background = element_blank(),
        panel.border = element_rect(colour = "black"))

As our goal is to visualize and model vertical profile from forests, we should remove 'ground points' from the analysis (those are represented in histograms by the bin -1 to 1 m; cutting up from one meter will remove other types of noise as well, such as debris).

#Filter out ground points (Z < 1m), because the objective is to visualize profiles from forest types.
df = subset(df, df$Z > 1)

In our example, after removing 'ground points', we got only unimodal distributions (has one peak) which is good because it is easier to model and use than multimodal distributions.

Before going for the vertical profiles, let's visualize our data in a boxplot as suggested in question.

#-------------------------------------
#Build boxplots.

ggplot(df, aes(x = factor(plot_id), y = Z)) + 
  geom_boxplot(outlier.size = 1) +
  xlab("Plot ID") + 
  ylab("Height (m)") +
  scale_y_continuous(limits = c(1, 33), breaks = c(seq(1, 33, 4))) +
  theme_bw() +
  theme(panel.border = element_rect(colour = "black"),
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank())

Finally, we are going to fit Weibull pdfs as a model to describe point's height distribution. Here, we are going to fit distributions and analyse them visually. On real analysis you will want checking the fit with QQ-plots and other statistical tests.

#-------------------------------------
#Get vertical profiles based on Weibull pdfs.

require(MASS)

weibull_parameters <- data.frame(plot_id = NA,
                      scale = NA, #Weibull scale parameter
                      shape = NA) #Weibull shape parameter


for (i in c(1:3)) {
  temp <- subset(df, df$plot_id == i)

  #Fit Weibull pdf through Maximum Likelihood Estimation (MLE)
  weibull <- fitdistr(na.omit(temp$Z),
                      densfun = dweibull,
                      start = list(scale = 1,shape = 5))

  weibull_parameters <- rbind(weibull_parameters, 
                              c(i, weibull$estimate[1:2]))
}

weibull_parameters <- weibull_parameters[-1,]
rm(i, temp, weibull)

# Calculate Weibull densities to Z values.
df <- merge(df, weibull_parameters)
df <- data.frame(cbind(plot_id = df$plot_id,
                       Z = df$Z,
                       weib_dens = mapply(dweibull, x = df$Z, shape = df$shape, scale = df$scale)))

rm(weibull_parameters)

 ggplot(df, aes(x = Z)) +
   geom_histogram(breaks = seq(1, 33, 4),
                  binwidth = 4,
                  aes(y = ..density..),
                  colour = "black",
                  fill = "white") +
   scale_y_continuous("Density") + 
   scale_x_continuous("Height (m)", limits = c(1, 33)) +
   geom_line(data = df,
             aes(x = Z, y = weib_dens),
             colour = "red",
             size = 0.4) +
   facet_wrap(~plot_id, ncol = 3, nrow = 1) +
   theme_bw() + 
   theme(strip.background = element_blank(),
         panel.border = element_rect(colour = "black")) + 
   coord_flip()

The advantage of capturing a vertical profile in a distribution is that we summarized height values from ~200k points in a theoretical function with two parameters. For example, in the left panel, the scale parameter is equal 17.1 and the shape parameter is equal 3. If one is not familiar with the Weibull parameters, I explain how to interpret them here.

Related Solutions

LiDAR – Retrieving Metrics from Relative Vertical Layers of Point Cloud

It is possible to run CloudMetrics as a first round to get the maximum height, and then, run it again specifying strata with the desired intervals. The trick is to automate such process.

I did it using the software R, but you can do it using other software (I guess even directly from the command prompt window).

I assume you already have one cylinder per lidar file, and also that the point clouds are normalized. If not, you will find help in:

First, let's exemplify how it would be considering just one lidar file. Here is the code, properly commented.

#Call Cloudmetrics .exe file from R. Change file paths and file names accordingly.
system("c:\\fusion\\cloudmetrics c:\\input\\file1.las c:\\output\\file1_temp.csv")

#Import CloudMetrics output in R.
cloudmetrics1 = read.csv("c:\\output\\file1_temp.csv",sep=",")

#Get maximum height from CloudMetrics output in 15th column. Perhaps you'll need to tweak the column number. 
max_height = cloudmetrics1[1,15]
#Set the vertical layers range.
layer_range = max_height/5
#Get strata values to be used in the second CloudMetrics run.
strata_value = c(layer_range,2*layer_range,3*layer_range,4*layer_range)

#Call Cloudmetrics again, but now using the switch 
#'strata' to specify which breaks for the vertical layers.
system(paste("c:\\fusion\\cloudmetrics /strata:[0,1,",
             strata_value[1],",",strata_value[2],",",strata_value[3],",",strata_value[4],
             "] c:\\input\\file1.las c:\\output\\file1.csv",sep=""))
#Done.

Now, the same code, but with a for loop that will automate the process. Note that the name of the input files need to have some kind of ID to be identified inside the loop.

#Loop through the las files with names 'file#.las' where # is equal some integer number.
#The example below loop through two las files named file1.las and file2.las, respectively.

for (i in c(1:2)){

  system(paste("c:\\fusion\\cloudmetrics c:\\input\\file",i,".las c:\\output\\file",i,"_temp.csv", sep=""))

  cloudmetrics1 = read.csv(paste("c:\\output\\file",i,"_temp.csv",sep=""),sep=",")
  max_height = cloudmetrics1[1,15]
  layer_range = max_height/5
  strata_value = c(layer_range,2*layer_range,3*layer_range,4*layer_range)

  system(paste("c:\\fusion\\cloudmetrics /strata:[0,1,",
               strata_value[1],",",strata_value[2],",",strata_value[3],",",strata_value[4],
               "] c:\\input\\file",i,".las c:\\output\\file",i,".csv",sep=""))
}

The csv outputs will contain the number and proportion of all returns in each layer specified.

You may continue using R to organize the generated statistics in one table, and continue the processing workflow from there.

R LiDAR – How to Filter Above Ground LiDAR Points Using Catalog in lidR

In your case the filter you are using is a simple one: Classification != 2. And you don't need the ground points at all. You are better to use a streaming filter and a streamed processing.

ctg <- catalog("/...2015TestGroup")
opt_chunk_size(ctg) <- 0
opt_chunk_buffer(ctg) <- 0
opt_output_files(ctg) <- ".../Outputs/2015nonground/{XLEFT}_{YBOTTOM}_2015"
opt_filter(ctg) <- "-drop_class 2"
new_ctg <- catalog_retile(ctg)

catalog_retile is a streamed function. It means that not a single point is loaded at the R level. Each point is read, filtered and written immediately without calling any R code and without loading any memory. It is thus way faster than using the R function lasfilter.

That being said, you are better to use las2las from LAStools for this kind of job. It does exactly the same actually but is more efficient and simple than calling more or less equivalent code from R.

Best Answer

Related Solutions

LiDAR – Retrieving Metrics from Relative Vertical Layers of Point Cloud

R LiDAR – How to Filter Above Ground LiDAR Points Using Catalog in lidR

Related Question