You should check what the scale and offset are for your file. This can be done as follows:
van_taken.header.scale
van_taken.header.offset
This almost looks like an overflow error to me. The lower case x, y, and z properties need to re-scale and re-offset the coordinates to store it as an integer (which is how LAS files store them). To be honest, setting the values of the coordinates as scaled values is a bit of an anti-patern, because you lose control of how precision is lost.
Here's the code that handles your set operation, located in base.py:
def set_x(self, X, scale = False):
'''Wrapper for set_dimension("X", new_dimension)'''
if not scale:
self.set_dimension("X", X)
return
self.set_dimension("X", np.round((X - self.header.offset[0])/self.header.scale[0]))
return
If you can figure out what the unscaled integer representation of your converted data should be, that's probably a better way to store it (e.g., using the capital letter X, Y, and Z properties of the file).
If you're fine with the above approach to converting between the integer representation and the floating point representation, then I'd consider adjusting your scale to ensure that you don't end up with integers greater than four bytes in size.
If this isn't explicable via integer overflow due to scaling issues, then we definitely need to figure out what's going on. If it's an overflow issue, I'd be open to trying to guard against this case so long as it doesn't have too terrible a performance penalty.
Edit:
It looks like overflow is definitely the issue.
When you're assigning a scaled coordinate value into a LAS file, laspy needs to find some way of representing it as a four byte integer. Currently, it faithfully believes the information in the header. That is, it will subtract the appropriate offset (for the X, Y,or Z dimension) and divide by the appropriate scale (for the X, Y, or Z dimension). The result is then rounded to produce an integer.
Your file has an X scale of 1e-7, and an X offset of -83.11. Thus, to convert any new scaled value of x to its integer representation (which is what happens when you assign into the lower case 'x' property of your file), you need to add 83.11 and divide by 1e-7. For your first value, 269873.21570411, this results in a value of 2.699563e+12. The largest number you can store in four bytes is 2.14e9 for signed integers and 4.29e9 for unsigned.
Currently, laspy doesn't check for this mistake, resulting in an integer overflow. As I mentioned above, it's probably best to assign the integer values (to the capital X, Y, and Z properties) yourself to avoid any ambiguity.
As a quick fix, however, you can simply change the offset. The following ought to work:
van_taken.header.scale = [0.01,0.01,0.01]
van_taken.header.offset = [0,0,0]
You can increase the precision of your conversion by using a large offset and large scale. For example, if all of your scaled X coordinates are greater than 200000, you could use an X offset of 200000. Then, when a small scale like 1e-7 is used, the numbers it will be inflating will be smaller. That's something to play around with, keeping in mind the four byte limit.
In a lot of problems, and in a lot of computing environments, it's easy to gloss over the fact that floating point arithmetic is fundamentally not like real number arithmetic. Unfortunately, working with LAS files is not one of those cases.
Edit 2:
So can I change the scale value? Will this affect the data in any other ways?
Yes, the reason you can change the scale in this case is that you're supplying the scaled value. If you tell the LAS file that the scale is some particular value, that's the value it will use when re-scaling data. You wouldn't want to mess with the scale if you were reading existing LAS data.
Edit 3
Last question. So it seems that you were right about the scales. But I tried the .01 and it wasn't giving me the number. I then tried 1.000001 and it seemed closer, but they still aren't the same. Any tips for selecting the correct value, other than trial and error?
I don't think there's really a 'correct' value per-se, it's a trade off. No matter what you do, you're trying to store a (probably 8 byte) floating point vector as a vector of 4-byte integers multiplied by an 8 byte (double precision) floating point scale term and added to an 8 byte floating point offset term. That's not a lossless conversion in general, but it's what you have to do to store data in a standards compliant LAS file.
I would consider trying something like this (not tested):
converted_x = converted_csv[:,1]
converted_y = converted_csv[:,2]
converted_z = converted_csv[:,3]
xmin = np.floor(np.min(converted_x))
ymin = np.floor(np.min(converted_y))
zmin = np.floor(np.min(converted_z))
xmax = np.ceil(np.max(converted_x))
ymax = np.ceil(np.max(converted_y))
zmax = np.ceil(np.max(converted_z))
xrg = xmax - xmin
yrg = ymax - ymin
zrg = zmax - zmin
van_taken.header.offset = [xmin,ymin,zmin]
safety_factor = 2
maxval = 2e9
van_taken.header.scale = [safety_factor*xrg/maxval, safety_factor*yrg/maxval, safety_factor*zrg/maxval]
van_taken.x = converted_x
van_taken.y = converted_y
van_taken.z = converted_z
OK, I think I've solved my problem.
The issue is noted at the github page for laspy here:
https://github.com/grantbrown/laspy/issues/51
I needed to adjust the "start_first_evlr" property to be the full length of the LAS file as shown below. Not sure if that is the best way, but it seems to work. I needed to read the mmap size for the input LAS file (before the EVLR was appended) to get the index where the EVLR should be placed.
# update the "start_first_evlr" property if this is the first EVLR. Otherwise the laspy code will start with an offset of 0...
if (len(inFile_v14.header.evlrs) == 0):
theReader = laspy.base.Reader(inFileName,"r")
theDataProvider = laspy.base.DataProvider(inFileName, theReader)
theDataProvider.open("r")
theDataProvider.map()
endofthefile = theDataProvider._mmap.size()
outFile_v14.header.start_first_evlr = endofthefile
theDataProvider.close()
theReader.close()
I don't think I should have had to do this, though... The laspy code in base.py seems to want to reset the mmap buffer to the size of the first EVLR (disregarding the point data content when the new EVLR is added). This takes place around line 904 (snippet shown below) in base.py in the "set_evlrs" method. The "dat_part_1" object should contain all the LAS point info up to the end of the input file. However the "old_offset" value is set to 0 without the code I added to my script above (where I set the "start_first_evlr" property).
self.data_provider.fileref.seek(0, 0)
dat_part_1 = self.data_provider.fileref.read(old_offset)
# Manually Close:
self.data_provider.close(flush=False)
self.data_provider.open("w+b")
self.data_provider.fileref.write(dat_part_1)
total_evlrs = sum([len(x) for x in value])
self.data_provider.fileref.write("\x00"*total_evlrs)
self.data_provider.fileref.close()
self.data_provider.open("r+b")
self.data_provider.map()
self.seek(old_offset, rel = False)
for evlr in value:
self.data_provider._mmap.write(evlr.to_byte_string())
if self.has_point_records:
self.data_provider.point_map()
self.populate_evlrs()
The full attached code seems to work:
# Set input / output filenames
evlrContentFile = "./LAS_DATA/evlrContentFile.txt"
inFileName = './LAS_DATA/infile.las'
outFileName = './LAS_DATA/outfile.las'
print('Update started: {:%Y-%m-%d %H:%M:%S}'.format(datetime.datetime.now()))
# Get EVLR string content
with open(evlrContentFile, "r") as evlrFile:
evlrString=evlrFile.read();
# open input LAS file
inFile_v14 = laspy.file.File(inFileName, mode = "r")
# make a copy of the input file
shutil.copy(inFileName,outFileName);
# modify 'rw' the copied file
outFile_v14 = laspy.file.File(outFileName, mode = "rw")
# create new EVLR record
new_evlr = laspy.header.EVLR(user_id = EVLR_userid, record_id = EVLR_recordid,
VLR_body = evlrString)
# update the "start_first_evlr" property if this is the first EVLR. Otherwise the laspy code will start with an offset of 0...
if (len(inFile_v14.header.evlrs) == 0):
theReader = laspy.base.Reader(inFileName,"r")
theDataProvider = laspy.base.DataProvider(inFileName, theReader)
theDataProvider.open("r")
theDataProvider.map()
endofthefile = theDataProvider._mmap.size()
outFile_v14.header.start_first_evlr = endofthefile
theDataProvider.close()
theReader.close()
# outFile_14 has the same, empty EVLR as inFile
old_evlrs = inFile_v14.header.evlrs
old_evlrs.append(new_evlr)
# update the EVLR property
outFile_v14.header.evlrs = old_evlrs
# close and exit
outFile_v14.close()
inFile_v14.close()
print('Update Completed: {:%Y-%m-%d %H:%M:%S}'.format(datetime.datetime.now()))
Best Answer
Rather than setting the entire points array in one go, try setting each dimension in turn: