How to display XML tree structure with Python? - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: How to display XML tree structure with Python? (/thread-6204.html) Pages:
1
2
|
How to display XML tree structure with Python? - sonicblind - Nov-10-2017 Hi, I have a large multi-level XML document of a complicated structure, without any namespace definition. I would like to generate a simplified tree view of its structure, so that every possible element from the XML is shown and only once. As a simplified example take this XML: <data> <timestamp>...</timestamp> <people> <person> <name>...</name> <age>...</age> </person> <person> <name>...</name> <age>...</age> <degree /> </person> <person> <name>...</name> <age>...</age> <degree /> <siblings> <brother>...</brother> <brother>...</brother> <sister>...</sister> </siblings> </person> </people> <cities> <city> <name>...</name> <country>...</country> <continent>...</continent> <capital /> </city> <city> <name>...</name> <country>...</country> <continent>...</continent> </city> </cities> </data>Using Python I would like to generate a view of its structure, looking something like this: -data- -timestamp- -people- -person- -name- -age- -degree- -siblings- -brother- -sister- -cities- -city- -name- -country- -continent- -capital-So, basically I am not interested in the values, or how many elements of the same type are in the XML, etc. I only want to see which elements are in there. I know there might be visual tools to achieve this, but I need to be able to generate such tree view also directly inside python script. Thanks for any ideas. RE: How to display XML tree structure with Python? - Larz60+ - Nov-10-2017 lxml has etree (prettyprint option) see: http://lxml.de/api.html RE: How to display XML tree structure with Python? - sonicblind - Nov-10-2017 prettyprint does not help me as it shows everything as is in the XML. That's exactly what I want to avoid. I need no duplicates and no values or attributes. Only the very basic tree structure. RE: How to display XML tree structure with Python? - Larz60+ - Nov-10-2017 You can look here to see what's available as packages: https://pypi.python.org/pypi?%3Aaction=search&term=xml&submit=search You may have to write it yourself, it you can't find what you're looking for RE: How to display XML tree structure with Python? - snippsat - Nov-10-2017 Just getting name of tags work fine in both lxml and BeautifulSoup. Keeping the structure in output can be a challenge, as both pretty print()lxml and prettify()BS i do not think work for text output.Example getting tag names: from lxml import etree from bs4 import BeautifulSoup xml = '''\ <data> <timestamp>...</timestamp> <people> <person> <name>...</name> <age>...</age> </person> <person> <name>...</name> <age>...</age> <degree /> </person> <person> <name>...</name> <age>...</age> <degree /> <siblings> <brother>...</brother> <brother>...</brother> <sister>...</sister> </siblings> </person> </people> <cities> <city> <name>...</name> <country>...</country> <continent>...</continent> <capital /> </city> <city> <name>...</name> <country>...</country> <continent>...</continent> </city> </cities> </data> ''' root = etree.fromstring(xml) soup = BeautifulSoup(xml, 'lxml') # lxml for node in root.iter('*'): print(node.tag) # BS for tag in soup.findChildren(): print(tag.name)
RE: How to display XML tree structure with Python? - wavic - Nov-10-2017 Pretty straight away: from lxml import etree from collections import Counter xml = '''\ <data> <timestamp>...</timestamp> <people> <person> <name>...</name> <age>...</age> </person> <person> <name>...</name> <age>...</age> <degree /> </person> <person> <name>...</name> <age>...</age> <degree /> <siblings> <brother>...</brother> <brother>...</brother> <sister>...</sister> </siblings> </person> </people> <cities> <city> <name>...</name> <country>...</country> <continent>...</continent> <capital /> </city> <city> <name>...</name> <country>...</country> <continent>...</continent> </city> </cities> </data> ''' root = etree.fromstring(xml) for tag in root.iter(): path = tree.getpath(tag) path = path.replace('/', ' ') spaces = Counter(path) tag_name = path.split()[-1].split('[')[0] tag_name = ' ' * (spaces[' '] - 4) + tag_name print(tag_name)
RE: How to display XML tree structure with Python? - wavic - Nov-15-2017 I have missed to put tree = etree.ElementTree(root) before the for loop
RE: How to display XML tree structure with Python? - sonicblind - Nov-15-2017 Thanks to all of you for the tips! They helped me to achieve my goal. wavic - My aim was to have no duplicates, so your code was almost perfect, but I reworked it a bit also to include the attributes. Here is my final code in case it helps somebody else as well. I will use it any time I need to see clearly the structure of any XML file, to know all tags/attributes which I need to consider. import re, collections from lxml import etree xml = '''\ <data> <timestamp>not important</timestamp> <people> <person name="Blue" given="John"> <occupation>not important</occupation> <age>not important</age> </person> <person name="Green" given="Peter"> <occupation>not important</occupation> <age>not important</age> <degree /> </person> <person name="Red" given="Angela" maiden="Orange"> <occupation fulltime="yes">not important</occupation> <age>not important</age> <birthday>not important</birthday> <degree /> <siblings > <brother attrib1="no" attrib2="yes">not important</brother> <brother attrib1="yes">not important</brother> <sister>not important</sister> </siblings> </person> </people> <cities> <city name="Tokyo"> <country>not important</country> <continent>not important</continent> <capital /> </city> <city name="Atlanta"> <country>not important</country> <continent>not important</continent> <olympics count="1"> <year>1996</year> <season>summer</season> </olympics> </city> </cities> </data> ''' xml_root = etree.fromstring(xml) raw_tree = etree.ElementTree(xml_root) nice_tree = collections.OrderedDict() for tag in xml_root.iter(): path = re.sub('\[[0-9]+\]', '', raw_tree.getpath(tag)) if path not in nice_tree: nice_tree[path] = [] if len(tag.keys()) > 0: nice_tree[path].extend(attrib for attrib in tag.keys() if attrib not in nice_tree[path]) for path, attribs in nice_tree.items(): indent = int(path.count('/') - 1) print('{0}{1}: {2} [{3}]'.format(' ' * indent, indent, path.split('/')[-1], ', '.join(attribs) if len(attribs) > 0 else '-'))Which gives me following result:
RE: How to display XML tree structure with Python? - wavic - Nov-15-2017 Good! At first, I was thinking that this will be a difficult task but it seems that xpath is of great help. RE: How to display XML tree structure with Python? - mreshko - Aug-12-2020 Hi sonicblind. Great code! Very useful. Thank you. It would be great if you could add these two feature to the code: (1) show the child's' number after the level, e.g. 3.0: occupation [fulltime] 3.1: age [-] 3.2: degree [-] 3.3: birthday [-] 3.4: siblings [-] (2) show the number of identical siblings, for example, if there were, say, 100 "person" elements, it would display it as 2: person [name, given, maiden] [100] Many thanks |