How to display XML tree structure with Python?

sonicblind · Nov-15-2017, 08:34 PM

Thanks to all of you for the tips!
They helped me to achieve my goal.

wavic - My aim was to have no duplicates, so your code was almost perfect, but I reworked it a bit also to include the attributes.

Here is my final code in case it helps somebody else as well.
I will use it any time I need to see clearly the structure of any XML file, to know all tags/attributes which I need to consider.

import re, collections
from lxml import etree
 
xml = '''\
<data>
    <timestamp>not important</timestamp>
    <people>
        <person name="Blue" given="John">
            <occupation>not important</occupation>
            <age>not important</age>
        </person>
        <person name="Green" given="Peter">
            <occupation>not important</occupation>
            <age>not important</age>
            <degree />
        </person>
        <person name="Red" given="Angela" maiden="Orange">
            <occupation fulltime="yes">not important</occupation>
            <age>not important</age>
            <birthday>not important</birthday>
            <degree />
            <siblings >
                <brother attrib1="no" attrib2="yes">not important</brother>
                <brother attrib1="yes">not important</brother>
                <sister>not important</sister>
            </siblings>
        </person>
    </people>
    <cities>
        <city name="Tokyo">
            <country>not important</country>
            <continent>not important</continent>
            <capital />
        </city>
        <city name="Atlanta">
            <country>not important</country>
            <continent>not important</continent>
            <olympics count="1">
            	<year>1996</year>
            	<season>summer</season>
            </olympics>
        </city>
    </cities>
</data>
'''

xml_root = etree.fromstring(xml)
raw_tree = etree.ElementTree(xml_root)
nice_tree = collections.OrderedDict()

for tag in xml_root.iter():
	path = re.sub('\[[0-9]+\]', '', raw_tree.getpath(tag))
	if path not in nice_tree:
		nice_tree[path] = []
	if len(tag.keys()) > 0:
		nice_tree[path].extend(attrib for attrib in tag.keys() if attrib not in nice_tree[path])			

for path, attribs in nice_tree.items():
	indent = int(path.count('/') - 1)
	print('{0}{1}: {2} [{3}]'.format('    ' * indent, indent, path.split('/')[-1], ', '.join(attribs) if len(attribs) > 0 else '-'))

Which gives me following result:

Output:0: data [-]
    1: timestamp [-]
    1: people [-]
        2: person [name, given, maiden]
            3: occupation [fulltime]
            3: age [-]
            3: degree [-]
            3: birthday [-]
            3: siblings [-]
                4: brother [attrib1, attrib2]
                4: sister [-]
    1: cities [-]
        2: city [name]
            3: country [-]
            3: continent [-]
            3: capital [-]
            3: olympics [count]
                4: year [-]
                4: season [-]

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Web app structure with python as backend	alt19	1	2,000	Oct-06-2020, 11:28 PM Last Post: scidam
	Cant set api response as tree in Element tree	hey_arnold	4	3,738	Mar-04-2019, 03:25 PM Last Post: dsarin

How to display XML tree structure with Python?

User Panel Messages

Announcements