coding Parsing XML in Python

Published on March 2nd, 2013 | by Daniel Khatkar

2

Parsing and Generating XML in Python

Alright bros, I finally got around to writing a post. I will be going through the process of how to parse and generate an XML document in Python. Both of these are fairly simple and extremely useful, considering the amount of XML documents you may come across in school or at work.

Parsing XML

The XML file we will be parsing and generating can be seen below.

1
2
3
4
5
6
7
8
<note>
        <to>Luke</to>
        <to>Dan</to>
        <from>Kalen</from>
        <heading title="Reminder">
                <body>Don't forget to study!</body>
        </heading>
</note>

The first thing you will want to do is import the Python library that provides functions to parse the XML file. The Python library that I found to be the easiest to use is the “xml.dom” library, which is included with your Python install. Depending on what your Python coding style is, you can import this library many ways. I chose to only import the “minidom” class, because this is the only one you need, but you can import the whole library if you wish.

1
 from xml.dom import minidom

Once you have imported the library, we can begin coding. The first thing you must do is, create the “minidom parser”. You must supply the path of the xml file when instantiating this object. In the example below I have set this object to a variable named “xmldom”. To grab an XML element within the file it is as simple as using the getElementsByTagName(“nameOfElement”) method with your minidom object. This method returns a list which will vary in length depending on how many of those elements are within your XML document. In the XML file above, there is are two “to” elements, so when we get an element by the name “to”, the returning list will be a length of 2.

To get the actual values of each element, we will need to iterate through the returning list. This can be done with a for loop. If you know that there is only one element to be returned from the XML document, you can access it directly, without a for loop. This can be seen in the section of the code below commented “Get and print each elements value”

To get an attribute of an element, you must first get the element by its name and then access the attribute. This can be seen in the section of the code below, commented with “Get Attribute”. Since there is only one “heading” element, it is accessed directly instead of a for loop, like I stated you can do earlier.

As seen in the example XML Document above, the “heading” element also has a child node. To access that child node, you will first have to get the list of “heading” elements, then get the list of child nodes. This can be seen in the code below, under the commented section “Get Child Elements”. Notice how I am accessing the headings list directly here again, if there were more “heading” elements I would have to use a for loop.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
def parse_xml():
    xmldom = minidom.parse(tool.xml_path)
 
    #Get "to" Element
    to_list = xmldom.getElementsByTagName("to")
 
    #Get and print each elements value.
    for each in to_list:
       print each.childNodes[0].nodeValue
 
    #Get Attribute
    headings_list = xmldom.getElementsByTagName("heading")
    title = heading_list[0].getAttribute('title')
    print title
 
    #Get Child Elements
    headings_list = xmldom.getElementsByTagName("heading")
    child_list = headings_list[0].getElementsByTagName("body")
    print child_list[0].childNodes[0].nodeValue

Generating XML

Generating XML is even simpler than parsing. The library that I use to generate an XML document is the lxml library and ElementTree class. It is also included in your Python install. The code below shows how to import it.

1
from lxml import etree as ET

The code below shows how to generate the XML Document provided at the top of this post. You will first need to create a root Element object and add Sub Element objects to it. The Element object takes the name of your root element. The Sub Element object takes the parent element of that sub element as a first argument and the name of that sub element as the second. You can set the value of each element with the “text” variable. You can also set attributes to each element using the “set” method, which takes the name of the attribute as the first argument and the value of that attribute as the second. To generate the XML string, you will need to create an ElementTree object by passing the root element. After doing this you can pass that ElementTree object to a “tostring() method. I provide the “pretty_print=True” option to the tostring() method so the XML document os formatted correctly.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
def generate_xml():
    root = ET.Element("note")
    to = ET.SubElement(root, "to")
    to.text = "Luke"
    to = ET.SubElement(root, "to") 
    to.text = "Dan"
    from_var = ET.SubElement(root, "from")
    from_var.text = "Kalen"
    heading = ET.SubElement(root, "heading")
    heading.set("title", "Reminder")
    body = ET.SubElement(heading, "body")
 
    tree = ET.ElementTree(root)
    xml_string = ET.tostring(tree, pretty_print=True)
    print xml_string

And it is as easy as that! Hopefully someone finds these steps useful, enjoy your coding.


About the Author

Dev who enjoys security, scripting, fitness and beers.



2 Responses to Parsing and Generating XML in Python

  1. brendan says:

    Any experience using the BeautifulSoup library? I’ve had some success using it to parse XML data but the xml.dom library looks much easier to use, and I’m wondering if you’re able to make a comparison. Great post!

    • Daniel Khatkar says:

      I have only used the xml.dom library to be honest. I know there are many other libraries that can take care of parsing xml, but I was taught with xml.dom and haven’t even thought of looking for another library. That’s how easy I found it. Glad you enjoyed the article.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

*

Back to Top ↑