Get Numpy array from SHAPE@WKB token

arcpyarraybuffernumpywell-known-binary

I am trying to get a numpy array from the SHAPE@WKB token that is obtained either using FeatureClassToNumpyArray or cursors, however what I get does not make much sense.

Specifically I am interested in obtaining the xy coordinates that make up different polylines. Although I would like to generalize the question a bit more for any geometry.

For instance:

>>> x = arcpy.da.FeatureClassToNumpyArray(fcPolylines,['SHAPE@WKB'])
>>> np.frombuffer(x[0]['SHAPE@WKB'])
array([  1.62969277e-311,   1.69495090e-156,   3.83453941e+085,
         1.64947523e-156,   3.85098414e+085,   1.63156857e-156,
         3.85745948e+085,   3.21142670e-322,   0.00000000e+000,
         0.00000000e+000,   0.00000000e+000,   0.00000000e+000,
         0.00000000e+000,   0.00000000e+000,   0.00000000e+000,
         ...,
         0.00000000e+000,   0.00000000e+000,   0.00000000e+000,
         0.00000000e+000,   0.00000000e+000,   0.00000000e+000,
         0.00000000e+000,   0.00000000e+000,   0.00000000e+000,
         0.00000000e+000,   0.00000000e+000,   0.00000000e+000,
         0.00000000e+000,   0.00000000e+000,   0.00000000e+000 ])

I'm really looking to get the same information that you get when requesting the SHAPE@XY token next to the explode_to_points = True argument:

>>> arcpy.da.FeatureClassToNumPyArray("Perfiles",['SHAPE@XY'],explode_to_points=True)
array([([517465.97249554005, 4640473.617421979],),
       ([517467.13585821074, 4640570.8782546315],),
       ([517467.62161390204, 4640611.48898075],)], 
     dtype=[('SHAPE@XY', '<f8', (2,))])

After @mikewatt’s answer, I knew how to get what I want from the WKB attribute of an arcpy geometry object:

>>> polyline = arcpy.CopyFeatures_management("Perfiles",arcpy.Geometry())
>>> np.frombuffer(polyline[0].WKB[9:],np.float64).reshape((-1,2))
array([[  517465.9725,  4640473.6174],
       [  517467.1359,  4640570.8783],
       [  517467.6216,  4640611.489 ]])

However, although as seen this method does work when I extract the WKB from an arcpy object, it does not seem to work when I apply it on the extracted WKB as a token and I get the following error:

>>> x = arcpy.da.FeatureClassToNumpyArray(fcPolylines,['SHAPE@WKB'])
>>> x[0]['SHAPE@WKB']

>>> np.frombuffer(x[0]['SHAPE@WKB'][9:],np.float64).reshape((-1,2))
Runtime error 
Traceback (most recent call last):
  File "<string>", line 1, in <module>
IndexError: can't index void scalar without fields

Best Answer

The error is occurring because FeatureClassToNumpyArray is returning the WKB as a void datatype, in contrast to a Python bytestring like arcpy.Geometry().WKB gives. This means numpy isn't trying to interpret it as numerical data, so a lot of the methods it makes available for numerical data (like slicing) won't work. So we can just round trip back through bytes:

buff = x[0]['SHAPE@WKB'].tobytes()

But if you inspect that you'll notice it might be longer than it should be, padded out to some length with null bytes. We'll also have to figure out the correct slice of data we can feed into np.frombuffer(), so it doesn't try to digest garbage. We'll have to go back and expand upon the more complete example from my previous answer:

import arcpy
import numpy as np

records = arcpy.da.FeatureClassToNumPyArray(polyline_fc, ['SHAPE@WKB'])

dtype_struct = np.dtype([
    ('byteOrder', np.byte),
    ('wkbType', np.uint32),
    ('numPoints', np.uint32)
])
dtype_point = np.dtype((np.float64, 2))

for rec in records:
    # Convert from a void array to a bytestring we can slice up
    buff = rec['SHAPE@WKB'].tobytes()

    array_struct = np.frombuffer(buff[:dtype_struct.itemsize], dtype_struct)

    assert array_struct['byteOrder'] == 1
    assert array_struct['wkbType'] == 2  # This example is only for linestrings

    # Slice off the WKB header which describes the struct and the padding which
    # arcpy adds to the end of the string, leaving us with point data only.
    len_struct = dtype_struct.itemsize
    len_points = array_struct['numPoints'] * dtype_point.itemsize
    point_buff = buff[len_struct:len_struct + len_points]

    array_points = np.frombuffer(point_buff, dtype_point)

    print(array_points)

All this to say... it seems excessive to be writing your own WKB parser when you could just use the interfaces provided by arcpy's geometry objects. Unless you reeeallly need to be eeking out some optimization then keeping is simple is usually better, even if it might be a tad slower.

Related Question