Python XML – Using Python to Parse XML with GML Tags

gmlpythonxml

I am trying to write a python script which pulls GML tags out of an XML file and formats them into WKT for inserting into a PostGIS database. I have been successful in doing so for an XML containing a single part polygon using the follow code:

    rootElement = ET.parse("GMLExample_Polygon.xml").getroot()
    wkt = ""
    for subelement in rootElement.getiterator():
        for subsub in subelement:
            if subsub.tag == "{http://www.opengis.net/gml}X":
                x = subsub.text
            if subsub.tag == "{http://www.opengis.net/gml}Y":
                y = subsub.text
                point_for_pol = "%s %s, " % (x, y)
                wkt += point_for_pol
    wkt = wkt[:-2]

This code clearly won't work for multipolygons. I am unsure how to access the geometry for each polygon tag ("gml:Polygon srsName="BNG") separately and pull only the geometry nested under it. I am trying to use ElementTree, not sure if this is the best module to use? The XML is structured as follows:

<Order xsi:noNamespaceSchemaLocation="http://lalala.com/xml_polygon_order.xsd">
    <OrderRequest>
        <CustomerReference>999998</CustomerReference>
        <SiteAddress>
            <Premise>456</Premise>
            <Street>long street</Street>
            <Locality/>
            <Town>London</Town>
            <County>London</County>
            <PostCode>PN1 1PN</PostCode>
        </SiteAddress>
        <SiteGeography>
            <gml:Polygon srsName="BNG">
                <gml:outerBoundaryIs>
                    <gml:LinearRing>
                        <gml:coord>
                            <gml:X>452847.6009</gml:X>
                            <gml:Y>18596.0496</gml:Y>
                        </gml:coord>
                        <gml:coord>
                            <gml:X>415847.6009</gml:X>
                            <gml:Y>184596.0496</gml:Y>
                        </gml:coord>
                        <gml:coord>
                            <gml:X>415847.6009</gml:X>
                            <gml:Y>184596.0496</gml:Y>
                        </gml:coord>
                    </gml:LinearRing>
                </gml:outerBoundaryIs>
            </gml:Polygon>
            <gml:Polygon srsName="BNG">
                <gml:outerBoundaryIs>
                    <gml:LinearRing>
                        <gml:coord>
                            <gml:X>452847.6009</gml:X>
                            <gml:Y>18596.0496</gml:Y>
                        </gml:coord>
                        <gml:coord>
                            <gml:X>415847.6009</gml:X>
                            <gml:Y>184596.0496</gml:Y>
                        </gml:coord>
                        <gml:coord>
                            <gml:X>415847.6009</gml:X>
                            <gml:Y>184596.0496</gml:Y>
                        </gml:coord>
                    </gml:LinearRing>
                </gml:outerBoundaryIs>
            </gml:Polygon>
        </SiteGeography>
    </OrderRequest>
</Order>

Thanks for any help.

Best Answer

I enjoy using ElementTree. It's standardized in Python since 2.5 as xml.etree.ElementTree. Forgive me for being blunt, but you're using it wrong. I suggest trying the find, findtext, and findall methods when you know the structure of the data. Is Order your root element? If so,

>>> geography = rootElement.find('OrderRequest/SiteGeography')
>>> for polygon in geography.findall('{http://www.opengis.net/gml}Polygon'):
...     for coord in polygon.findall(
...             "{http://www.opengis.net/gml}outerBoundaryIs/"
...             "{http://www.opengis.net/gml}LinearRing/"
...             "{http://www.opengis.net/gml}coord"):
...         print(
...             coord.findtext("{http://www.opengis.net/gml}X"),
...             coord.findtext("{http://www.opengis.net/gml}Y"))
... 
('452847.6009', '18596.0496')
('415847.6009', '184596.0496')
('415847.6009', '184596.0496')
('452847.6009', '18596.0496')
('415847.6009', '184596.0496')
('415847.6009', '184596.0496')

http://pymotw.com/2/xml/etree/ElementTree/parse.html#finding-nodes-in-a-document has more advice on using ElementTree.

Related Question