ElementTree
does allow one to programatically edit XML metadata files. I've used this on a recent project to update many tags using information stored in a database. So basically for each shapefile I have a data description, citation, abstract, etc. stored in a table and access the tags using ElementTree
and retrieving the metadata using a search cursor. I'm not an expert on the structure of the ElementTree
library, but the long and short of it is that you create an "iterator" object which is the parent tag of the "subelement(s)" you want to edit. Say, for example you have the following tags and you want to change the publisher information:
<pubinfo>
<publish>U.S. Geological Survey, Reston, VA</publish>
</pubinfo>
You could write something like the following snippet in python:
import xml.etree.ElementTree as et
#--get xml file and parse it
root = et.parse(os.path.join(shpPath,xmlFile)).getroot()
#--feature description
iterator = root.getiterator('pubinfo')
for elem in iterator:
subelem = 'publish'
old_subelem = elem.find(subelem)
elem.remove(old_subelem)
new_subelem = et.SubElement(elem, subelem)
new_subelem.text = 'New Publisher, Anytown, USA'
Note that the iterator object will search the entire XML file looking for the tag "pubinfo". If multiple tags share the same name, the iterator object will contain one element for each occurrence. In this case you will have to dig through your XML files to make sure you are working on the correct one. Say there are 2 instances of the tag pubinfo
(one could be for the "local citation" and one could be for the "larger work citation") and you only want to change the subelement publish
in first one you would replace the for elem in iterator
with elem = iterator[0]
. If you need more I can provide you with some scripts I wrote, though I'll refrain from posting them in their entirety here.
I have some comments/suggestions for you. You mention that you do not have any admin privileges on your machine to be able to install the metadata module. Good news here, you don't need admin to install python packages/modules. You do if you're using a binary install, however, most modules can be installed using pip. You can just download pip and put it in your C:\Python27\ArcGIS10.x\Scripts
folder.
You can also just download python packages and just place the modules somewhere in your PYTHONPATH
such as C:\Python27\ArcGIS10.x\Lib\site-packages
. The flaws with this is you could be missing some dependencies, but that is where pip will be the better option as it should install all dependencies as well.
However, with all that being said, I have never used the metadata module, but I believe the builtin xml module will do everything you need. I actually built a wrapper a while back that has convenience methods for working with xml
files (see below). You can try this to see if it helps.
As for hardcoding indices in your script for the metadata, I would avoid doing this. I am not certain if ArcGIS will add future elements to the metadata, but if anything does get added/deleted, it could definitely mess up the indices in your current structure. It is best to get at the elements by name. You can use the xml.etree.ElementTree.Element.find()
or xml.etree.ElementTree.Element.findall()
methods to accomplish this.
Here is the wrapper I built for working with xml
files:
from xml.etree.ElementTree import ElementTree, Element, SubElement, Comment, tostring, parse, fromstring, fromstringlist
from xml.dom import minidom
from xml.sax.saxutils import escape, unescape
import os
import codecs
HTML = {
'"': """,
"'": "'",
">": ">",
"<": "<",
}
HTML_UNESC = {v:k for k,v in HTML.iteritems()}
class BaseXML(object):
def __init__(self, xml_file):
"""base class for xml files"""
self.document = xml_file
if isinstance(xml_file, list):
# we have a list of strings?
self.tree = fromstringlist(xml_file)
elif isinstance(xml_file, basestring) and not os.path.isfile(xml_file) and '<' in xml_file:
# we have a string?
self.tree = fromstring(xml_file)
elif os.path.exists(xml_file):
self.tree = parse(self.document)
else:
raise IOError('Invalid Input for XML file')
self.directory = os.path.dirname(self.document)
self.root = self.tree.getroot()
self.parent_map = {}
# make static copy
self._backup = parse(self.document).getroot()
# initialize parent map
self.updateParentMap()
@staticmethod
def iterElm(root, tag_name=None, childrenOnly=True, **kwargs):
"""return generator for tree
Optional:
tag_name -- name of tag
kwargs -- optional key word args to filter by tag attributes
"""
for tag in root.iter(tag_name):
if all([tag.get(k) == v for k,v in kwargs.iteritems()]):
if childrenOnly and tag != root:
yield tag
elif not childrenOnly:
yield tag
def elmHasTags(self, root, tag, **kwargs):
"""tests if there are valid tags
tag_name -- name of tag to check for
"""
gen = self.iterElm(root, tag, **kwargs)
try:
gen.next()
return True
except StopIteration:
return False
def findChild(self, parent, child_name, **kwargs):
"""find child anywhwere under parent element
child_name -- name of tag
kwargs -- keyword args to filter
"""
for c in self.iterElm(parent, child_name, **kwargs):
return c
def findChildren(self, parent, child_name, **kwargs):
"""find all children anywhwere under parent element,
returns a list of elements.
child_name -- name of tag
kwargs -- keyword args to filter
"""
return [c for c in self.iterElm(parent, child_name, **kwargs)]
def validateElm(self, elm, elm_name=None, **kwargs):
"""validates whether input is an Element name or Element object. If it
is an Element name, it will return the Element object with that name and
any additional key word args
Required:
elm -- element name or Element object
elm_name -- name of Element.tag, only used if elm is a string.
Optional:
kwargs -- keyword argument filters, required if elm is a string
"""
if isinstance(elm, Element):
return elm
elif isinstance(elm, basestring):
return self.getElm(elm_name, **kwargs)
def updateParentMap(self):
"""updates the parent_map dictionary"""
self.parent_map = {c:p for p in self.tree.iter() for c in p}
def countParents(self, elm, parent_name, **kwargs):
"""Count the number of parents an element has of a certain name, does
heiarchal search
Required:
elm -- child element for which to search parents
parent_name -- name of parent tag
Optional:
kwargs -- keyword argument filters
"""
count = 0
parent = self.getParent(elm, parent_name, **kwargs)
while parent != None:
count += 1
parent = self.getParent(parent, parent_name, **kwargs)
return count
def getParent(self, child, parent_name=None, **kwargs):
"""get parent element by tag name or first parent
Required:
child -- child element for which to find parent
tag_name -- name of tag
Optional:
kwargs -- optional key word args to filter by tag attributes
"""
parent = self.parent_map.get(child)
if parent is None:
return None
if parent_name is None:
return parent
else:
if parent.tag == parent_name and all([parent.get(k) == v for k,v in kwargs.iteritems()]):
return parent
else:
return self.getParent(parent, parent_name, **kwargs)
def elmHasParentOfName(self, child, parent_name=None, **kwargs):
"""checks if a child element has a parent of an input name
Required:
child -- child element for which to find parent
tag_name -- name of tag
Optional:
kwargs -- optional key word args to filter by tag attributes
"""
return self.getParent(child, parent_name, **kwargs) is not None
def getElm(self, tag_name, root=None, **kwargs):
"""get specific tag by name and kwargs filter
Required:
tag_name -- name of tag
Optional:
root -- root element to start with, defaults to the ElementTree
kwargs -- optional key word args to filter by tag attributes
"""
for tag in self.iterTags(tag_name, root=root, **kwargs):
return tag
def findChildrenWithKeys(self, elm, tag_name=None, keys=[]):
"""finds children of a parent Element of a specific tag and/or if that element has
attributes matching the names found in input keys list
Required:
elm -- root element
Optional: (should implement one or both of these)
tag_name -- name of tags to search for
keys -- list of attribute keys to check for
"""
if isinstance(keys, basestring):
keys = [keys]
return [c for c in self.iterChildren(elm, tag_name) if c is not None and all(map(lambda k: k in c.keys(), keys))]
@staticmethod
def prettify(elem):
"""Return a pretty-printed XML string for the Element."""
rough_string = tostring(elem, 'utf-8')
reparsed = minidom.parseString(rough_string)
pretty = reparsed.toprettyxml(indent=" ").split('\n')
return '\n'.join([l for l in pretty if l.strip()])
def iterTags(self, tag_name=None, root=None, **kwargs):
"""return generator for tree
Optional:
tag_name -- name of tag
root -- optional root tag to start from, if None specified defaults
to the ElementTree
kwargs -- optional key word args to filter by tag attributes
"""
if isinstance(root, Element):
return self.iterElm(root, tag_name, **kwargs)
else:
return self.iterElm(self.tree, tag_name, **kwargs)
@staticmethod
def iterChildren(parent, tag=None, childrenOnly=True, **kwargs):
"""iterate all children of an element based on **kwargs filter
Required:
parent -- element for which to search children
Optional:
tag -- name of tag for filter
childrenOnly -- return children only, if false, iterator will start
at parent
kwargs -- optional key word args to filter by tag attributes
"""
for elm in parent.iter(tag):
if all([elm.get(k) == v for k,v in kwargs.iteritems()]):
if childrenOnly and elm != parent:
yield elm
elif not childrenOnly:
yield elm
def hasTags(self, tag_name, root=None, **kwargs):
"""tests if there are valid tags
tag_name -- name of tag to check for
"""
gen = self.iterTags(tag_name, **kwargs)
try:
gen.next()
return True
except StopIteration:
return False
def addElm(self, tag_name, attrib={}, root=None, update_map=True):
"""add SubElement to site or existing element
Required:
tag_name -- name of new element
Optional:
attrib -- dictionary of attributes for new element
root -- parent element for which to add element. If none specified,
element will be added to <Site> root.
update_map -- option to update parent map, you may want to disable this
when making many changes during an iterative process. Default is True.
"""
if root is None:
root = self.root
sub = SubElement(root, tag_name, attrib)
if update_map:
self.updateParentMap()
return sub
def restore(self):
"""reverts all changes back to the state at which the Site.xml document was
when this class was initialized
"""
self.__init__(self.document)
def save(self):
"""saves the changes"""
with codecs.open(self.document, 'w', 'utf-8') as f:
f.write(self.prettify(self.root))
def __iter__(self):
"""create generator"""
for elm in iter(self.tree.iter()):
yield elm
To use this, save it in your C:\Python27\ArcGIS10.x\Lib\site-packages
(or better yet a network share where it can be imported from) as something like xmlhelper.py
. To do some of the above stuff, you can do something like the following:
import xmlhelper # make sure this is importable, if not use sys.path.append(r'path_to_module_parent_folder') first
import os
import datetime
import glob
ws = arcpy.env.workspace = r"path/to/folder"
today = datetime.date.today()
date = today.strftime("%Y%m%d")
# find all metadata files in path
for f in glob.glob(os.path.join(ws, '*.shp.xml')):
# user wrapper here
doc = xmlhelper.BaseXML(f)
# get edition date and set it
editiondate = doc.getElm('resEdDate')
editiondate.text = today.strftime("%Y-%m-%d")
# update revise date
revisedate = doc.getElm('reviseDate')
revisedate.text = today.strftime("%Y-%m-%d") + "T00:00:00"
# update title
title = doc.getElm('resTitle') # should be the same, regardless of shapefile?
# may want to be a little more explicit with if statement here....
if f.endswith('A.shp.xml'):
title.text = 'MCMS (polygon)'
elif f.endswith('Zones.shp.xml'):
title.text = 'MCMS Exclusion Zones'
# save it
doc.save()
print 'done'
To test to make sure it is working, I would make a copy of the data on your desktop first to try this out on that. If it works, then you can run it against your production data. The code above is untested.
Best Answer
We had a big need for a similar capability and ended up building a general, free, open source Python library for the purpose. You can find it at https://github.com/ucd-cws/arcpy_metadata or by running a "pip install arcpy_metadata". There is some documentation of features and how to use it, with some additional contributions from the World Resources Institute. We tried to keep things relatively Pythonic so that it integrates well and can be learned quickly. Here's an example:
It still has plenty that could be added, but is pretty extensible if you subclass the items that are already there, or configure them correctly. It's still about alpha quality software, but it works and we're happy with it.
For anyone looking for this capability within ArcGIS Pro, as of version 2.5, they now include a metadata API from Python. There are more details in the Metadata class documentation.