[GIS] How does the offset values of LAS files relate to their coordinate system

coordinate systemlaslidar

I have a conundrum concerning LiDAR data, the stored point offset, and the coordinate system.

I process large amounts of LAS data in TerraScan (TS), which sits on top of Microstation (MS). I receive LAS files that are created from raw data by CloudPro. Traditionally, when setting up a project, before importing points, I set the coordinate system up in TS, including any false easting/northing values. I typically set these values to that of whatever coordinate system I happen to be working in (e.g. For NAD83 Virginia South State Plane US_Foot, false_easting: 11482916.667, false_northing: 3280833.333).

I just started working on a dataset in NAD83 UTM 15N meters (false easting: 500000). I set the TS false easting, and began setting up the project. Unfortunately, after some investigation, the LAS files have an x,y,z offset (230247.299, 3550963.700, 0) that will not allow them to fit inside the bounds defined by the 500000m false easting. I understand this is due to MS's coordinate range of +/-2147484 of the defined coordinate origin x,y,z. I also understand that an LAS file's coordinates take the scale and offset into consideration. From the ASPRS specification:

X, Y, and Z offset: The offset fields should be used to set the
overall offset for the point records. In general these numbers will
be zero, but for certain cases the resolution of the point data may
not be large enough for a given projection system. However, it should
always be assumed that these numbers are used. So to scale a given X
from the point record, take the point record X multiplied by the X
scale factor, and then add the X offset.

This got me thinking, how (if any) does the offset of an LAS file relate to the coordinate system that it is projected to?

Best Answer

This got me thinking, how (if any) does the offset of an LAS file relate to the coordinate system that it is projected to?

They don't really relate to the coordinate system. The offset (in combination with the scale) are used to allow the XYZ values to be stored as 32-bit integers with enough precision to fit entirely within the box [−2,147,483,648, 2,147,483,647]. It is common to set the offset to the minimum value for the file/tile/dataset for each dimension, or the sometimes I will see the midpoint of each dimension. Many times the offset for each dimension will be the minimal value for a range of tiles. Any value is legitimate as long as it doesn't cause the data to overflow the range of the box when the stored integer is multiplied by the scale factor and added to the offset.

When you are creating LAS files, you should work to ensure that the scale factors represent measurement scale of the data (say 0.01 or 0.025) and not the highest possible precision available given the range of the box (like very tiny things such as 1.264142e-7). Over driving the precision has no real cost for uncompressed LAS, but it is very detrimental to compression like LAZ, where it essentially ends up adding incompressible noise. Martin Isenburg has written extensively on this topic in numerous venues.

Related Solutions

Coordinate System LAS Data – When to Reproject LAS Data?

Yes, geographic coordinates and DEM is bad, depending upon your software of course; several Esri functions don't work properly in geographic coordinates. Your point spacing becomes tiny and so does your cell size.

I believe las2las will reproject LiDAR data based on the readme. This data is supplied in geographic coordinates possibly because it needs to cover a very large area and a projected coordinate system would not be a SRID coordinate system to cover that.

I definitely recommend projecting geographic las to projected las or tools like las2dem probably wont produce intended results, possibly due to small number rounding.

Another case to reproject LiDAR data is when you are supplied las files in different coordinate systems - pick one and project the others.

[GIS] Converting between coordinate systems for LAS file using LASPY

You should check what the scale and offset are for your file. This can be done as follows:

van_taken.header.scale
van_taken.header.offset

This almost looks like an overflow error to me. The lower case x, y, and z properties need to re-scale and re-offset the coordinates to store it as an integer (which is how LAS files store them). To be honest, setting the values of the coordinates as scaled values is a bit of an anti-patern, because you lose control of how precision is lost.

Here's the code that handles your set operation, located in base.py:

    def set_x(self, X, scale = False):
        '''Wrapper for set_dimension("X", new_dimension)'''
        if not scale:
            self.set_dimension("X", X)
            return
        self.set_dimension("X", np.round((X -     self.header.offset[0])/self.header.scale[0]))
        return

If you can figure out what the unscaled integer representation of your converted data should be, that's probably a better way to store it (e.g., using the capital letter X, Y, and Z properties of the file).

If you're fine with the above approach to converting between the integer representation and the floating point representation, then I'd consider adjusting your scale to ensure that you don't end up with integers greater than four bytes in size.

If this isn't explicable via integer overflow due to scaling issues, then we definitely need to figure out what's going on. If it's an overflow issue, I'd be open to trying to guard against this case so long as it doesn't have too terrible a performance penalty.

Edit:

It looks like overflow is definitely the issue.

When you're assigning a scaled coordinate value into a LAS file, laspy needs to find some way of representing it as a four byte integer. Currently, it faithfully believes the information in the header. That is, it will subtract the appropriate offset (for the X, Y,or Z dimension) and divide by the appropriate scale (for the X, Y, or Z dimension). The result is then rounded to produce an integer.

Your file has an X scale of 1e-7, and an X offset of -83.11. Thus, to convert any new scaled value of x to its integer representation (which is what happens when you assign into the lower case 'x' property of your file), you need to add 83.11 and divide by 1e-7. For your first value, 269873.21570411, this results in a value of 2.699563e+12. The largest number you can store in four bytes is 2.14e9 for signed integers and 4.29e9 for unsigned.

Currently, laspy doesn't check for this mistake, resulting in an integer overflow. As I mentioned above, it's probably best to assign the integer values (to the capital X, Y, and Z properties) yourself to avoid any ambiguity.

As a quick fix, however, you can simply change the offset. The following ought to work:

van_taken.header.scale = [0.01,0.01,0.01]
van_taken.header.offset = [0,0,0]

You can increase the precision of your conversion by using a large offset and large scale. For example, if all of your scaled X coordinates are greater than 200000, you could use an X offset of 200000. Then, when a small scale like 1e-7 is used, the numbers it will be inflating will be smaller. That's something to play around with, keeping in mind the four byte limit.

In a lot of problems, and in a lot of computing environments, it's easy to gloss over the fact that floating point arithmetic is fundamentally not like real number arithmetic. Unfortunately, working with LAS files is not one of those cases.

Edit 2:

So can I change the scale value? Will this affect the data in any other ways?

Yes, the reason you can change the scale in this case is that you're supplying the scaled value. If you tell the LAS file that the scale is some particular value, that's the value it will use when re-scaling data. You wouldn't want to mess with the scale if you were reading existing LAS data.

Edit 3

Last question. So it seems that you were right about the scales. But I tried the .01 and it wasn't giving me the number. I then tried 1.000001 and it seemed closer, but they still aren't the same. Any tips for selecting the correct value, other than trial and error?

I don't think there's really a 'correct' value per-se, it's a trade off. No matter what you do, you're trying to store a (probably 8 byte) floating point vector as a vector of 4-byte integers multiplied by an 8 byte (double precision) floating point scale term and added to an 8 byte floating point offset term. That's not a lossless conversion in general, but it's what you have to do to store data in a standards compliant LAS file.

I would consider trying something like this (not tested):

converted_x = converted_csv[:,1]
converted_y = converted_csv[:,2]
converted_z = converted_csv[:,3]
xmin = np.floor(np.min(converted_x))
ymin = np.floor(np.min(converted_y))
zmin = np.floor(np.min(converted_z))

xmax = np.ceil(np.max(converted_x))
ymax = np.ceil(np.max(converted_y))
zmax = np.ceil(np.max(converted_z))

xrg = xmax - xmin
yrg = ymax - ymin
zrg = zmax - zmin

van_taken.header.offset = [xmin,ymin,zmin]
safety_factor = 2
maxval = 2e9
van_taken.header.scale = [safety_factor*xrg/maxval, safety_factor*yrg/maxval, safety_factor*zrg/maxval]    
van_taken.x = converted_x
van_taken.y = converted_y
van_taken.z = converted_z