LiDAR Data Export – How to Export LiDAR Data with TreeID to CSV

csvlidarlidrrrlidar

I have used the lidR package in R to run a point cloud segmentation of trees using the li2012 algorithm. I am interested in exporting my results to a csv file, but the methods I have tried did not include the treeID column.

As suggested in this post: (Exporting coordinates from LAS files to CSV?) I used las2txt and rLiDAR but neither option included the treeID column in the csv it generated.

How is it possible to export to csv and include the tree segmentation
results?
Would it work best to create a spatial points data frame following lidR book Chapter 11 then use the SF or SP package to convert it to a csv? (https://jean-romain.github.io/lidRbook/tba.html)

Additionally as a second topic, I am interested in linking my results with a postgresql / postGIS database. Is a conversion to csv followed by a table creation in postgresql efficient? Any better way?

Best Answer

Run the example code in the help:

LASfile <- system.file("extdata", "MixedConifer.laz", package="lidR")
 las <- readLAS(LASfile, select = "xyz", filter = "-drop_z_below 0")
 
 # Using Li et al. (2012)
 las <- segment_trees(las, li2012(R = 3, speed_up = 5))

Converting that to an sf object gives one point per LAS input point (37,000 of them), with tree ID of 269 trees:

library(sf)
lsp = st_as_sf(as.spatial(las))
plot(lsp[,"treeID"])
range(lsp$treeID,na.rm=TRUE)
# [1]   1 269

(Note that I can't find a direct las-to-sf function, so I'm using lidR::as.spatial to make an sp object, then st_as_sf to convert to sf.)

To save that as CSV, convert to a data frame with the coordinates:

> lspdf = cbind(st_drop_geometry(lsp), st_coordinates(lsp))
> head(lspdf)
     Z treeID        X       Y
1 0.07     80 481349.5 3813011
2 0.11     80 481348.7 3813011
3 0.04     80 481348.7 3813010

and that can be saved using write.csv or write.table or whatever.

If you want one record per tree, use tree_metrics to summarize over trees, convert to sf:

metrics <- tree_metrics(las, ~list(z_max = max(Z))) # calculate tree metrics
metrics.sf = st_as_sf(metrics)
head(metrics.sf)
# Simple feature collection with 6 features and 2 fields
# Geometry type: POINT
# Dimension:     XY
# Bounding box:  xmin: 481325.2 ymin: 3813006 xmax: 481349.8 ymax: 3813011
# Projected CRS: NAD83 / UTM zone 12N
#   treeID z_max                 geometry
# 1     80 23.00 POINT (481349.9 3813011)
# 2     83 22.96 POINT (481336.1 3813011)
# 3     67 23.46 POINT (481334.6 3813006)

that is 269 rows, one per tree. Convert to data frame as before and save as before.

Related Solutions

[GIS] Converting between coordinate systems for LAS file using LASPY

You should check what the scale and offset are for your file. This can be done as follows:

van_taken.header.scale
van_taken.header.offset

This almost looks like an overflow error to me. The lower case x, y, and z properties need to re-scale and re-offset the coordinates to store it as an integer (which is how LAS files store them). To be honest, setting the values of the coordinates as scaled values is a bit of an anti-patern, because you lose control of how precision is lost.

Here's the code that handles your set operation, located in base.py:

    def set_x(self, X, scale = False):
        '''Wrapper for set_dimension("X", new_dimension)'''
        if not scale:
            self.set_dimension("X", X)
            return
        self.set_dimension("X", np.round((X -     self.header.offset[0])/self.header.scale[0]))
        return

If you can figure out what the unscaled integer representation of your converted data should be, that's probably a better way to store it (e.g., using the capital letter X, Y, and Z properties of the file).

If you're fine with the above approach to converting between the integer representation and the floating point representation, then I'd consider adjusting your scale to ensure that you don't end up with integers greater than four bytes in size.

If this isn't explicable via integer overflow due to scaling issues, then we definitely need to figure out what's going on. If it's an overflow issue, I'd be open to trying to guard against this case so long as it doesn't have too terrible a performance penalty.

Edit:

It looks like overflow is definitely the issue.

When you're assigning a scaled coordinate value into a LAS file, laspy needs to find some way of representing it as a four byte integer. Currently, it faithfully believes the information in the header. That is, it will subtract the appropriate offset (for the X, Y,or Z dimension) and divide by the appropriate scale (for the X, Y, or Z dimension). The result is then rounded to produce an integer.

Your file has an X scale of 1e-7, and an X offset of -83.11. Thus, to convert any new scaled value of x to its integer representation (which is what happens when you assign into the lower case 'x' property of your file), you need to add 83.11 and divide by 1e-7. For your first value, 269873.21570411, this results in a value of 2.699563e+12. The largest number you can store in four bytes is 2.14e9 for signed integers and 4.29e9 for unsigned.

Currently, laspy doesn't check for this mistake, resulting in an integer overflow. As I mentioned above, it's probably best to assign the integer values (to the capital X, Y, and Z properties) yourself to avoid any ambiguity.

As a quick fix, however, you can simply change the offset. The following ought to work:

van_taken.header.scale = [0.01,0.01,0.01]
van_taken.header.offset = [0,0,0]

You can increase the precision of your conversion by using a large offset and large scale. For example, if all of your scaled X coordinates are greater than 200000, you could use an X offset of 200000. Then, when a small scale like 1e-7 is used, the numbers it will be inflating will be smaller. That's something to play around with, keeping in mind the four byte limit.

In a lot of problems, and in a lot of computing environments, it's easy to gloss over the fact that floating point arithmetic is fundamentally not like real number arithmetic. Unfortunately, working with LAS files is not one of those cases.

Edit 2:

So can I change the scale value? Will this affect the data in any other ways?

Yes, the reason you can change the scale in this case is that you're supplying the scaled value. If you tell the LAS file that the scale is some particular value, that's the value it will use when re-scaling data. You wouldn't want to mess with the scale if you were reading existing LAS data.

Edit 3

Last question. So it seems that you were right about the scales. But I tried the .01 and it wasn't giving me the number. I then tried 1.000001 and it seemed closer, but they still aren't the same. Any tips for selecting the correct value, other than trial and error?

I don't think there's really a 'correct' value per-se, it's a trade off. No matter what you do, you're trying to store a (probably 8 byte) floating point vector as a vector of 4-byte integers multiplied by an 8 byte (double precision) floating point scale term and added to an 8 byte floating point offset term. That's not a lossless conversion in general, but it's what you have to do to store data in a standards compliant LAS file.

I would consider trying something like this (not tested):

converted_x = converted_csv[:,1]
converted_y = converted_csv[:,2]
converted_z = converted_csv[:,3]
xmin = np.floor(np.min(converted_x))
ymin = np.floor(np.min(converted_y))
zmin = np.floor(np.min(converted_z))

xmax = np.ceil(np.max(converted_x))
ymax = np.ceil(np.max(converted_y))
zmax = np.ceil(np.max(converted_z))

xrg = xmax - xmin
yrg = ymax - ymin
zrg = zmax - zmin

van_taken.header.offset = [xmin,ymin,zmin]
safety_factor = 2
maxval = 2e9
van_taken.header.scale = [safety_factor*xrg/maxval, safety_factor*yrg/maxval, safety_factor*zrg/maxval]    
van_taken.x = converted_x
van_taken.y = converted_y
van_taken.z = converted_z

Exporting LiDAR Data – How to Efficiently Write LiDAR Data to CSV Using data.table::fwrite in R

Use

writelas2csv <- function(x,y) {
  data.table::fwrite(payload(x),y)
}