Python Forum

Full Version: Preserve xml file format
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi!
I would like to manipulate an xml file yet preserve it's format (extra spaces, comments, cr, ..).
lxml preserve comments out of the box but was unable to figure out the rest.

My current solution is to parser the file line by line..

For example:
<node name="test_basic" size="0x200" >
<field name="aaa" offset="0x0.0" size="0x100.0" subnode="xxx" descr="" />
<field name="bbb" offset="++" size="0x1.0" subnode="xxx" descr="" />
<field name="ccc" offset="++" size="0x1.2" subnode="xxx" descr="" />
</node>

<!-- test comment-->


<node name="test_bits" size="?" >
<field name="aaa" offset="0x0.0" size="0.1" descr="" />
<field name="bbb" offset="++" size="0x1.1" descr="" />
</node>
Please show what you have tried so far.
The formatting were removed by the forum editor as well :/

Should have looked like this:
  <field name="aaa"                     offset="0x0.0"      size="0.1"    subnode="xxx" descr="" />
The code to test the preservation was: (And a lot of googling)
#import xml.etree.ElementTree as ET
import lxml.etree as ET

tree = ET.parse('test.xml')
tree.write('new.xml')
The output (new.xml) no longer contains spaces.
I believe that you can parse pure XML with BeautifulSoup, using 'lxml' as the parser without modifying the spacing.
I use the same setup all the time for html and haven't noticed any change in format.
It's been a while since I used lxml.etree directly. As I recall it was temperamental.