Python Forum

Full Version: Remove Empty tags in XML using plain python without lxml library
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
My use case is to remove empty tags in an XML using simple plain python 2.7. No extra lxml library is available.

Sample XML:
<ArML>
<MsgHeader>
<Deal>
<Attribute>
<Name>First</Name>
<Value>10</Value>
</Attribute>
<Attribute>
<Name>Second</Name>
<Value></Value>
</Attribute>
<Attribute>
<Name>Third</Name>
<Value></Value>
</Attribute>
<Attribute>
<Name>Fourth</Name>
<Value>40</Value>
</Attribute>
</Deal>
</MsgHeader>
<MsgHeader>
<Deal>
<Attribute>
<Name>Fifth</Name>
<Value>10</Value>
</Attribute>
<Attribute>
<Name>Sixth</Name>
<Value></Value>
</Attribute>
<Attribute>
<Name>Seventh</Name>
<Value>70</Value>
</Attribute>
<Attribute>
<Name>Eight</Name>
<Value></Value>
</Attribute>
</Deal>
</MsgHeader>
</ArML>

I am using the below code, but it is not working properly for all the empty tags. Please help.

for elem in root.iter('MsgHeader'):
Deal = root.find("./MsgHeader/Deal")
empty = root.find("./MsgHeader/Deal/Attribute/[Value='']")
Deal.remove(empty)
print(ET.tostring(root, encoding='utf8').decode('utf8'))
Could you provide the actual code you're running, so we can try it out? For example, I don't even know what module you're using, as find() or nodelist.remove() aren't part of either the DOM nor SAX interfaces.
Below is the whole code that i'm using as of now.

 
import xml.etree.ElementTree as ET

tree = ET.parse("xml_test.txt")
root = tree.getroot()

for elem in root.iter('MsgHeader'):
Deal = root.find("./MsgHeader/Deal")
empty = root.find("./MsgHeader/Deal/Attribute/[Value='']")
Deal.remove(empty)
print(ET.tostring(root, encoding='utf8').decode('utf8'))
The use case is that at some places in the sample XML, the <Value> tag is empty. So every-time we encounter a tag like this, we need to remove the corresponding <Attribute> tag from the xml itself.
I believe that without using extra library like lxml etc., this requirement might not be possible, but as an implementation specialist i found a work-around for this.
You can replace your empty string with the 'NULL' keyword and then remove the line altogether where ever you find 'NULL' in the XML.