Using Laspy to Create LAS Files from Scratch

laslaspylidar

I have found instructions for creating a las file by opening an existing file, modifying it, then re-saving it.

This re-uses the headers from the original file.

But how do I create a new file from scratch without loading another in the first place? I cannot see any documentation on the required format for the file, points, or headers.

I only need the bare minimum: x,y,z position for each point.

Here is how I assumed it might work…

import numpy as np
import laspy
import laspy.header
import laspy.file

points = []

#x,y,z values
points.append([0,0,10])
points.append([1,0,10])
points.append([2,0,11])

header = laspy.header.Header()
outFile = laspy.file.File("./output.las", mode = "w", header = header)
outFile.points = np.array(points)
outFile.close()

This fails with:

-> outFile.points = np.array(points)

/laspy/file.py
-> self._writer.set_points(new_points)

laspy/base.py
-> self.data_provider._pmap[:] = points[:]

ValueError: could not broadcast input array from shape (3,3) into shape (3)

Could anyone kindly point to some sample code or tips to get this working?

Best Answer

Rather than setting the entire points array in one go, try setting each dimension in turn:

import laspy

header = laspy.header.Header()
outfile = laspy.file.File("output.las", mode="w", header=header)
outfile.X = [1, 2, 3]
outfile.Y = [0, 0, 0]
outfile.Z = [10, 10, 11]
outfile.close()

Edit:

It looks like overflow is definitely the issue.

When you're assigning a scaled coordinate value into a LAS file, laspy needs to find some way of representing it as a four byte integer. Currently, it faithfully believes the information in the header. That is, it will subtract the appropriate offset (for the X, Y,or Z dimension) and divide by the appropriate scale (for the X, Y, or Z dimension). The result is then rounded to produce an integer.

Your file has an X scale of 1e-7, and an X offset of -83.11. Thus, to convert any new scaled value of x to its integer representation (which is what happens when you assign into the lower case 'x' property of your file), you need to add 83.11 and divide by 1e-7. For your first value, 269873.21570411, this results in a value of 2.699563e+12. The largest number you can store in four bytes is 2.14e9 for signed integers and 4.29e9 for unsigned.

Currently, laspy doesn't check for this mistake, resulting in an integer overflow. As I mentioned above, it's probably best to assign the integer values (to the capital X, Y, and Z properties) yourself to avoid any ambiguity.

As a quick fix, however, you can simply change the offset. The following ought to work:

van_taken.header.scale = [0.01,0.01,0.01]
van_taken.header.offset = [0,0,0]

You can increase the precision of your conversion by using a large offset and large scale. For example, if all of your scaled X coordinates are greater than 200000, you could use an X offset of 200000. Then, when a small scale like 1e-7 is used, the numbers it will be inflating will be smaller. That's something to play around with, keeping in mind the four byte limit.

In a lot of problems, and in a lot of computing environments, it's easy to gloss over the fact that floating point arithmetic is fundamentally not like real number arithmetic. Unfortunately, working with LAS files is not one of those cases.

Edit 2:

So can I change the scale value? Will this affect the data in any other ways?

Yes, the reason you can change the scale in this case is that you're supplying the scaled value. If you tell the LAS file that the scale is some particular value, that's the value it will use when re-scaling data. You wouldn't want to mess with the scale if you were reading existing LAS data.

Edit 3

Last question. So it seems that you were right about the scales. But I tried the .01 and it wasn't giving me the number. I then tried 1.000001 and it seemed closer, but they still aren't the same. Any tips for selecting the correct value, other than trial and error?

I don't think there's really a 'correct' value per-se, it's a trade off. No matter what you do, you're trying to store a (probably 8 byte) floating point vector as a vector of 4-byte integers multiplied by an 8 byte (double precision) floating point scale term and added to an 8 byte floating point offset term. That's not a lossless conversion in general, but it's what you have to do to store data in a standards compliant LAS file.

I would consider trying something like this (not tested):

converted_x = converted_csv[:,1]
converted_y = converted_csv[:,2]
converted_z = converted_csv[:,3]
xmin = np.floor(np.min(converted_x))
ymin = np.floor(np.min(converted_y))
zmin = np.floor(np.min(converted_z))

xmax = np.ceil(np.max(converted_x))
ymax = np.ceil(np.max(converted_y))
zmax = np.ceil(np.max(converted_z))

xrg = xmax - xmin
yrg = ymax - ymin
zrg = zmax - zmin

van_taken.header.offset = [xmin,ymin,zmin]
safety_factor = 2
maxval = 2e9
van_taken.header.scale = [safety_factor*xrg/maxval, safety_factor*yrg/maxval, safety_factor*zrg/maxval]    
van_taken.x = converted_x
van_taken.y = converted_y
van_taken.z = converted_z

Python LASPy – How to Update EVLR Header Info for .LAS File Using LASPy

OK, I think I've solved my problem.

The issue is noted at the github page for laspy here: https://github.com/grantbrown/laspy/issues/51

I needed to adjust the "start_first_evlr" property to be the full length of the LAS file as shown below. Not sure if that is the best way, but it seems to work. I needed to read the mmap size for the input LAS file (before the EVLR was appended) to get the index where the EVLR should be placed.

# update the "start_first_evlr" property if this is the first EVLR.  Otherwise the laspy code will start with an offset of 0...
if (len(inFile_v14.header.evlrs) == 0):
    theReader = laspy.base.Reader(inFileName,"r")
    theDataProvider = laspy.base.DataProvider(inFileName, theReader)
    theDataProvider.open("r")
    theDataProvider.map()
    endofthefile = theDataProvider._mmap.size()
    outFile_v14.header.start_first_evlr = endofthefile
    theDataProvider.close()
    theReader.close()

I don't think I should have had to do this, though... The laspy code in base.py seems to want to reset the mmap buffer to the size of the first EVLR (disregarding the point data content when the new EVLR is added). This takes place around line 904 (snippet shown below) in base.py in the "set_evlrs" method. The "dat_part_1" object should contain all the LAS point info up to the end of the input file. However the "old_offset" value is set to 0 without the code I added to my script above (where I set the "start_first_evlr" property).

        self.data_provider.fileref.seek(0, 0)
        dat_part_1 = self.data_provider.fileref.read(old_offset)
        # Manually Close:
        self.data_provider.close(flush=False)
        self.data_provider.open("w+b")
        self.data_provider.fileref.write(dat_part_1)
        total_evlrs = sum([len(x) for x in value])
        self.data_provider.fileref.write("\x00"*total_evlrs)
        self.data_provider.fileref.close()
        self.data_provider.open("r+b")
        self.data_provider.map()
        self.seek(old_offset, rel = False)

        for evlr in value:
            self.data_provider._mmap.write(evlr.to_byte_string())

        if self.has_point_records:
            self.data_provider.point_map()
        self.populate_evlrs()

The full attached code seems to work:

 # Set input / output filenames
evlrContentFile = "./LAS_DATA/evlrContentFile.txt"
inFileName = './LAS_DATA/infile.las'
outFileName = './LAS_DATA/outfile.las'

print('Update started: {:%Y-%m-%d %H:%M:%S}'.format(datetime.datetime.now()))

# Get EVLR string content
with open(evlrContentFile, "r") as evlrFile:
    evlrString=evlrFile.read();

# open input LAS file
inFile_v14 = laspy.file.File(inFileName, mode = "r")

# make a copy of the input file
shutil.copy(inFileName,outFileName);

# modify 'rw' the copied file
outFile_v14 = laspy.file.File(outFileName, mode = "rw")

# create new EVLR record
new_evlr = laspy.header.EVLR(user_id = EVLR_userid, record_id = EVLR_recordid,
                VLR_body = evlrString)

# update the "start_first_evlr" property if this is the first EVLR.  Otherwise the laspy code will start with an offset of 0...
if (len(inFile_v14.header.evlrs) == 0):
    theReader = laspy.base.Reader(inFileName,"r")
    theDataProvider = laspy.base.DataProvider(inFileName, theReader)
    theDataProvider.open("r")
    theDataProvider.map()
    endofthefile = theDataProvider._mmap.size()
    outFile_v14.header.start_first_evlr = endofthefile
    theDataProvider.close()
    theReader.close()

# outFile_14 has the same, empty EVLR as inFile
old_evlrs = inFile_v14.header.evlrs
old_evlrs.append(new_evlr)

# update the EVLR property
outFile_v14.header.evlrs = old_evlrs

# close and exit
outFile_v14.close()
inFile_v14.close()
print('Update Completed: {:%Y-%m-%d %H:%M:%S}'.format(datetime.datetime.now()))

Best Answer

Related Solutions

[GIS] Converting between coordinate systems for LAS file using LASPY

Edit:

Edit 2:

Edit 3

Python LASPy – How to Update EVLR Header Info for .LAS File Using LASPy

Related Question