Python Forum
Parsing XML with lxml
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Parsing XML with lxml
#1
Hello Smile ,
I need to parse XML file.
For example, my XML file looks like this:
<root>
 <child1 attr1="value1" attr2="value2"/>
 <child2>
   <child3>
    some text
   </child3>
 </child2>
</root>
And when I use this code:

walkAll = root.getiterator()
for elt in walkAll:
    atr = elt.attrib
    
    if elt.attrib:
        stdout.write('<%s ' %elt.tag)
        for name, value in elt.attrib.items():
            attributes =' {0:s} = "{1:s}"'.format(name, value)
            stdout.write(attributes)
            

        #print("<%s %s>" % (elt.tag, atr))
    else:
        print("<%s>" % elt.tag)
    
    if elt.text == None:
        continue
    
    print("</%s>" % elt.tag)
My output looks like:
Output:
<root> </root> <child1 attr1= "value1" attr2 = "value2"<child2> </child2> <child3> some text </child3>
And I want to look like this without convert it to string.
Output:
<root> <child1 attr1="value1" attr2="value2"/> <child2> <child3> some text </child3> </child2> </root>
Reply
#2
I don't know what library you're using, but I don't think getiterator() is what you want to be using. It looks like you're getting all the elements in the document. In order to format it that way, you really only want one node at a time, which you can recursively parse it's children.
Reply
#3
Can you suggest me what to use then?
I'm using lxml etree now.
Reply
#4
Here's an example of a recursive function, which parses each node's children. Handling attributes and formatting is something I'll leave up to you :)

>>> doc = '''<root>
...  <child1 attr1="value1" attr2="value2"/>
...  <child2>
...    <child3>
...     some text
...    </child3>
...  </child2>
... </root>'''
>>> def parse(node):
...   print(f"<{node.tag}>")
...   for child in node:
...     parse(child)
...   print(f"</{node.tag}>")
...
>>> from lxml import etree
>>> root = etree.XML(doc)
>>> parse(root)
<root>
<child1>
</child1>
<child2>
<child3>
</child3>
</child2>
</root>
Reply
#5
Thank you. :)
It was really helpful.
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020