Python XML Libraries

By | December 16, 2016

XML, or Extensible Mark-up Language, is a set of rules for encoding data and documents in a way that is both human and machine readable. XML excels where you need to transfer data between different systems, as there is almost always going to be an XML parser available. In this post I take a quick look at two Python XML libraries to help you work with XML data.

For a more comprehensive look at python xml libraries visit here.

Python XML Libraries

ElementTree

The ElementTree module introduces the Element data type to python. The Element type is a cross between a list and a dictionary and is designed to store hierarchical data structures (like XML) in memory. The ElementTree library can load xml files as trees of ‘Element’ objects within Python.

ElementTree can both read and write xml files, as well as being able to search for sub-elements within XML data. You can also use ElementTree to create new XML files, and comes with various helper functions such as SimpleXMLWriter which helps you generate well-formed XML data. ElementTree comes packaged with Python (since version 2.5), so should work without trouble.

Read more about ElementTree on Python Module of the Week.

Lxml

Lxml  is a library that builds on, and extends the functionality of ElementTree. Lxml uses C libraries for parsing XML, and should be much faster than ElementTree for larger XML files. Lxml is largely compatibile with the ElementTree API so to some extent you can use whichever is more convenient. There doesn’t seem to be a tremendous amount of controversy between ElementTree and lxml, although for simpler tasks you may find little difference between Lxml and ElementTree.

Get started with the lxml tutorial.

Dive Into Python has a crash course in xml, as well has how you can parse xml with both ElementTree and Lxml.