Python Forum
Remove Empty tags in XML using plain python without lxml library - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Remove Empty tags in XML using plain python without lxml library (/thread-12328.html)



Remove Empty tags in XML using plain python without lxml library - saurabhverma2412 - Aug-20-2018

My use case is to remove empty tags in an XML using simple plain python 2.7. No extra lxml library is available.

Sample XML:
<ArML>
<MsgHeader>
<Deal>
<Attribute>
<Name>First</Name>
<Value>10</Value>
</Attribute>
<Attribute>
<Name>Second</Name>
<Value></Value>
</Attribute>
<Attribute>
<Name>Third</Name>
<Value></Value>
</Attribute>
<Attribute>
<Name>Fourth</Name>
<Value>40</Value>
</Attribute>
</Deal>
</MsgHeader>
<MsgHeader>
<Deal>
<Attribute>
<Name>Fifth</Name>
<Value>10</Value>
</Attribute>
<Attribute>
<Name>Sixth</Name>
<Value></Value>
</Attribute>
<Attribute>
<Name>Seventh</Name>
<Value>70</Value>
</Attribute>
<Attribute>
<Name>Eight</Name>
<Value></Value>
</Attribute>
</Deal>
</MsgHeader>
</ArML>

I am using the below code, but it is not working properly for all the empty tags. Please help.

for elem in root.iter('MsgHeader'):
Deal = root.find("./MsgHeader/Deal")
empty = root.find("./MsgHeader/Deal/Attribute/[Value='']")
Deal.remove(empty)
print(ET.tostring(root, encoding='utf8').decode('utf8'))



RE: Remove Empty tags in XML using plain python without lxml library - nilamo - Aug-20-2018

Could you provide the actual code you're running, so we can try it out? For example, I don't even know what module you're using, as find() or nodelist.remove() aren't part of either the DOM nor SAX interfaces.


RE: Remove Empty tags in XML using plain python without lxml library - saurabhverma2412 - Aug-20-2018

Below is the whole code that i'm using as of now.

 
import xml.etree.ElementTree as ET

tree = ET.parse("xml_test.txt")
root = tree.getroot()

for elem in root.iter('MsgHeader'):
Deal = root.find("./MsgHeader/Deal")
empty = root.find("./MsgHeader/Deal/Attribute/[Value='']")
Deal.remove(empty)
print(ET.tostring(root, encoding='utf8').decode('utf8'))
The use case is that at some places in the sample XML, the <Value> tag is empty. So every-time we encounter a tag like this, we need to remove the corresponding <Attribute> tag from the xml itself.


RE: Remove Empty tags in XML using plain python without lxml library - saurabhverma2412 - Aug-21-2018

I believe that without using extra library like lxml etc., this requirement might not be possible, but as an implementation specialist i found a work-around for this.
You can replace your empty string with the 'NULL' keyword and then remove the line altogether where ever you find 'NULL' in the XML.