ArcGIS Metadata Update – How to Programmatically Edit and Update Metadata in ArcGIS

arcgis-10.0arcobjectscmetadatapython

Has anyone succeeded in programmatically updating metadata in ArcGIS 10? Considering using Python/arcpy but ArcObjects (C# or Python/comtypes) is also a possibility.

I need to update both the FGDC and the ArcGIS-ISO format metadata, and whatever solution is used needs to be able to retain the existing (non-blank) elements along with the added elements, except where they are in conflict in which case the added elements overwrite the existing elements.

Best Answer

We had a big need for a similar capability and ended up building a general, free, open source Python library for the purpose. You can find it at https://github.com/ucd-cws/arcpy_metadata or by running a "pip install arcpy_metadata". There is some documentation of features and how to use it, with some additional contributions from the World Resources Institute. We tried to keep things relatively Pythonic so that it integrates well and can be learned quickly. Here's an example:

import arcpy_metadata as md
import datetime

metadata = md.MetadataEditor(path_to_some_feature_class)  # also has a feature_layer parameter if you're working with one, but edits get saved back to the source feature class
metadata.title = "The metadata title!"

generated_time = "This layer was generated on {0:s}".format(datetime.datetime.now().strftime("%m/%d/%Y %I:%M %p"))

metadata.purpose = "Layer represents locations of the rare Snipe."

metadata.abstract.append("generated by ___ software")
metadata.abstract.append(generated_time)  # .prepend also exists
metadata.tags.add(["foo", "bar", "baz"])  # tags.extend is equivalent to maintain list semantics

metadata.finish()  # save the metadata back to the original source feature class and cleanup. Without calling finish(), your edits are NOT saved!

It still has plenty that could be added, but is pretty extensible if you subclass the items that are already there, or configure them correctly. It's still about alpha quality software, but it works and we're happy with it.

For anyone looking for this capability within ArcGIS Pro, as of version 2.5, they now include a metadata API from Python. There are more details in the Metadata class documentation.

Related Solutions

[GIS] Update Multiple Metadata Tags at Once

ElementTree does allow one to programatically edit XML metadata files. I've used this on a recent project to update many tags using information stored in a database. So basically for each shapefile I have a data description, citation, abstract, etc. stored in a table and access the tags using ElementTree and retrieving the metadata using a search cursor. I'm not an expert on the structure of the ElementTree library, but the long and short of it is that you create an "iterator" object which is the parent tag of the "subelement(s)" you want to edit. Say, for example you have the following tags and you want to change the publisher information:

<pubinfo>
  <publish>U.S. Geological Survey, Reston, VA</publish> 
</pubinfo>

You could write something like the following snippet in python:

import xml.etree.ElementTree as et
#--get xml file and parse it
root = et.parse(os.path.join(shpPath,xmlFile)).getroot()

#--feature description
iterator = root.getiterator('pubinfo')
for elem in iterator:
    subelem = 'publish'
    old_subelem = elem.find(subelem)
    elem.remove(old_subelem)
    new_subelem = et.SubElement(elem, subelem)
    new_subelem.text = 'New Publisher, Anytown, USA'

Note that the iterator object will search the entire XML file looking for the tag "pubinfo". If multiple tags share the same name, the iterator object will contain one element for each occurrence. In this case you will have to dig through your XML files to make sure you are working on the correct one. Say there are 2 instances of the tag pubinfo (one could be for the "local citation" and one could be for the "larger work citation") and you only want to change the subelement publish in first one you would replace the for elem in iterator with elem = iterator[0]. If you need more I can provide you with some scripts I wrote, though I'll refrain from posting them in their entirety here.

[GIS] Editing ArcGIS metadata elements using Python

I have some comments/suggestions for you. You mention that you do not have any admin privileges on your machine to be able to install the metadata module. Good news here, you don't need admin to install python packages/modules. You do if you're using a binary install, however, most modules can be installed using pip. You can just download pip and put it in your C:\Python27\ArcGIS10.x\Scripts folder.

You can also just download python packages and just place the modules somewhere in your PYTHONPATH such as C:\Python27\ArcGIS10.x\Lib\site-packages. The flaws with this is you could be missing some dependencies, but that is where pip will be the better option as it should install all dependencies as well.

However, with all that being said, I have never used the metadata module, but I believe the builtin xml module will do everything you need. I actually built a wrapper a while back that has convenience methods for working with xml files (see below). You can try this to see if it helps.

As for hardcoding indices in your script for the metadata, I would avoid doing this. I am not certain if ArcGIS will add future elements to the metadata, but if anything does get added/deleted, it could definitely mess up the indices in your current structure. It is best to get at the elements by name. You can use the xml.etree.ElementTree.Element.find() or xml.etree.ElementTree.Element.findall() methods to accomplish this.

Here is the wrapper I built for working with xml files:

from xml.etree.ElementTree import ElementTree, Element, SubElement, Comment, tostring, parse, fromstring, fromstringlist
from xml.dom import minidom
from xml.sax.saxutils import escape, unescape
import os
import codecs

HTML = {
    '"': "&quot;",
    "'": "&apos;",
    ">": "&gt;",
    "<": "&lt;",
    }

HTML_UNESC = {v:k for k,v in HTML.iteritems()}

class BaseXML(object):
    def __init__(self, xml_file):
        """base class for xml files"""
        self.document = xml_file
        if isinstance(xml_file, list):
            # we have a list of strings?
            self.tree = fromstringlist(xml_file)

        elif isinstance(xml_file, basestring) and not os.path.isfile(xml_file) and '<' in xml_file:
            # we have a string?
            self.tree = fromstring(xml_file)

        elif os.path.exists(xml_file):
            self.tree = parse(self.document)

        else:
            raise IOError('Invalid Input for XML file')

        self.directory = os.path.dirname(self.document)
        self.root = self.tree.getroot()
        self.parent_map = {}

        # make static copy
        self._backup = parse(self.document).getroot()

        # initialize parent map
        self.updateParentMap()

    @staticmethod
    def iterElm(root, tag_name=None, childrenOnly=True, **kwargs):
        """return generator for tree

        Optional:
            tag_name -- name of tag
            kwargs -- optional key word args to filter by tag attributes

        """
        for tag in root.iter(tag_name):
            if all([tag.get(k) == v for k,v in kwargs.iteritems()]):
                if childrenOnly and tag != root:
                    yield tag

                elif not childrenOnly:
                    yield tag

    def elmHasTags(self, root, tag, **kwargs):
        """tests if there are valid tags

        tag_name -- name of tag to check for
        """
        gen = self.iterElm(root, tag, **kwargs)
        try:
            gen.next()
            return True

        except StopIteration:
            return False

    def findChild(self, parent, child_name, **kwargs):
        """find child anywhwere under parent element

        child_name -- name of tag
        kwargs -- keyword args to filter
        """
        for c in self.iterElm(parent, child_name, **kwargs):
            return c

    def findChildren(self, parent, child_name, **kwargs):
        """find all children anywhwere under parent element,
        returns a list of elements.

        child_name -- name of tag
        kwargs -- keyword args to filter
        """
        return [c for c in self.iterElm(parent, child_name, **kwargs)]

    def validateElm(self, elm, elm_name=None, **kwargs):
        """validates whether input is an Element name or Element object.  If it
        is an Element name, it will return the Element object with that name and
        any additional key word args

        Required:
            elm -- element name or Element object
            elm_name -- name of Element.tag, only used if elm is a string.

        Optional:
            kwargs -- keyword argument filters, required if elm is a string
        """
        if isinstance(elm, Element):
            return elm
        elif isinstance(elm, basestring):
            return self.getElm(elm_name, **kwargs)

    def updateParentMap(self):
        """updates the parent_map dictionary"""
        self.parent_map = {c:p for p in self.tree.iter() for c in p}

    def countParents(self, elm, parent_name, **kwargs):
        """Count the number of parents an element has of a certain name, does
        heiarchal search

        Required:
            elm -- child element for which to search parents
            parent_name -- name of parent tag

        Optional:
            kwargs -- keyword argument filters
        """
        count = 0
        parent = self.getParent(elm, parent_name, **kwargs)
        while parent != None:
            count += 1
            parent = self.getParent(parent, parent_name, **kwargs)
        return count

    def getParent(self, child, parent_name=None, **kwargs):
        """get parent element by tag name or first parent

        Required:
            child -- child element for which to find parent
            tag_name -- name of tag

        Optional:
            kwargs -- optional key word args to filter by tag attributes

        """
        parent = self.parent_map.get(child)
        if parent is None:
            return None
        if parent_name is None:
            return parent
        else:
            if parent.tag == parent_name and all([parent.get(k) == v for k,v in kwargs.iteritems()]):
                return parent
            else:
                return self.getParent(parent, parent_name, **kwargs)

    def elmHasParentOfName(self, child, parent_name=None, **kwargs):
        """checks if a child element has a parent of an input name

        Required:
            child -- child element for which to find parent
            tag_name -- name of tag

        Optional:
            kwargs -- optional key word args to filter by tag attributes
        """
        return self.getParent(child, parent_name, **kwargs) is not None

    def getElm(self, tag_name, root=None, **kwargs):
        """get specific tag by name and kwargs filter

        Required:
            tag_name -- name of tag

        Optional:
            root -- root element to start with, defaults to the ElementTree
            kwargs -- optional key word args to filter by tag attributes
        """
        for tag in self.iterTags(tag_name, root=root, **kwargs):
            return tag

    def findChildrenWithKeys(self, elm, tag_name=None, keys=[]):
        """finds children of a parent Element of a specific tag and/or if that element has
        attributes matching the names found in input keys list

        Required:
            elm -- root element

        Optional: (should implement one or both of these)
            tag_name -- name of tags to search for
            keys -- list of attribute keys to check for
        """
        if isinstance(keys, basestring):
            keys = [keys]

        return [c for c in self.iterChildren(elm, tag_name) if c is not None and all(map(lambda k: k in c.keys(), keys))]

    @staticmethod
    def prettify(elem):
        """Return a pretty-printed XML string for the Element."""
        rough_string = tostring(elem, 'utf-8')
        reparsed = minidom.parseString(rough_string)
        pretty =  reparsed.toprettyxml(indent="  ").split('\n')
        return '\n'.join([l for l in pretty if l.strip()])

    def iterTags(self, tag_name=None, root=None, **kwargs):
        """return generator for tree

        Optional:
            tag_name -- name of tag
            root -- optional root tag to start from, if None specified defaults
                to the ElementTree
            kwargs -- optional key word args to filter by tag attributes
        """
        if isinstance(root, Element):
            return self.iterElm(root, tag_name, **kwargs)
        else:
            return self.iterElm(self.tree, tag_name, **kwargs)

    @staticmethod
    def iterChildren(parent, tag=None, childrenOnly=True, **kwargs):
        """iterate all children of an element based on **kwargs filter

        Required:
            parent -- element for which to search children

        Optional:
            tag -- name of tag for filter
            childrenOnly -- return children only, if false, iterator will start
                at parent
            kwargs -- optional key word args to filter by tag attributes
        """
        for elm in parent.iter(tag):
            if all([elm.get(k) == v for k,v in kwargs.iteritems()]):
                if childrenOnly and elm != parent:
                    yield elm

                elif not childrenOnly:
                    yield elm

    def hasTags(self, tag_name, root=None, **kwargs):
        """tests if there are valid tags

        tag_name -- name of tag to check for
        """
        gen = self.iterTags(tag_name, **kwargs)
        try:
            gen.next()
            return True

        except StopIteration:
            return False

    def addElm(self, tag_name, attrib={}, root=None, update_map=True):
        """add SubElement to site or existing element

        Required:
            tag_name -- name of new element

        Optional:
            attrib -- dictionary of attributes for new element
            root -- parent element for which to add element.  If none specified,
                element will be added to <Site> root.
            update_map -- option to update parent map, you may want to disable this
                when making many changes during an iterative process. Default is True.
        """
        if root is None:
            root = self.root
        sub = SubElement(root, tag_name, attrib)
        if update_map:
            self.updateParentMap()
        return sub

    def restore(self):
        """reverts all changes back to the state at which the Site.xml document was
        when this class was initialized
        """
        self.__init__(self.document)

    def save(self):
        """saves the changes"""
        with codecs.open(self.document, 'w', 'utf-8') as f:
            f.write(self.prettify(self.root))

    def __iter__(self):
        """create generator"""
        for elm in iter(self.tree.iter()):
            yield elm

To use this, save it in your C:\Python27\ArcGIS10.x\Lib\site-packages (or better yet a network share where it can be imported from) as something like xmlhelper.py. To do some of the above stuff, you can do something like the following:

import xmlhelper # make sure this is importable, if not use sys.path.append(r'path_to_module_parent_folder') first
import os
import datetime
import glob

ws = arcpy.env.workspace = r"path/to/folder"
today = datetime.date.today()
date = today.strftime("%Y%m%d")

# find all metadata files in path
for f in glob.glob(os.path.join(ws, '*.shp.xml')):

    # user wrapper here
    doc = xmlhelper.BaseXML(f)

    # get edition date and set it
    editiondate = doc.getElm('resEdDate')
    editiondate.text = today.strftime("%Y-%m-%d")

    # update revise date
    revisedate = doc.getElm('reviseDate')
    revisedate.text = today.strftime("%Y-%m-%d") + "T00:00:00"

    # update title
    title = doc.getElm('resTitle')  # should be the same, regardless of shapefile?

    # may want to be a little more explicit with if statement here....
    if f.endswith('A.shp.xml'):
        title.text = 'MCMS (polygon)'
    elif f.endswith('Zones.shp.xml'):
        title.text = 'MCMS Exclusion Zones'

    # save it
    doc.save()

print 'done'

To test to make sure it is working, I would make a copy of the data on your desktop first to try this out on that. If it works, then you can run it against your production data. The code above is untested.

Best Answer

Related Solutions

[GIS] Update Multiple Metadata Tags at Once

[GIS] Editing ArcGIS metadata elements using Python

Related Question