Python Forum

Full Version: Write the XML file from elementtree with hexa decimal encoding
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi,

I am new to python development. I am using the Elementtree to parse and manipulate the XML file which contains the special characters with hexadecimal encoding. After the manipulation, I want to write the XML with the same ending format. When I try to write the XML the special characters are printed as decimal numbers by default. Is there any option to write the XML with special characters as hexadecimal values? like UTF-8, ASCII encoding method. Please suggest.

Output:
<p aid:pstyle="TextInd" >In preparing the illustrations for this book, I&#x00A0;have relied on the generosity of private archive owners Aleksandr Lavrent&#x2019;ev, Natalia Galadzheva, Sophia Bogatyreva, Elena Radkovskaia, and Aleksandra Radkovskaia, as well as the helpful, considerate service of the staff at the Russian State Archive of Literature and the Arts (RGALI), The State Museum of V.V. Mayakovsky, The State Central Film Museum in Moscow<?AQ AQ:&#x00A0;Is &#x201C;The&#x201D; part of these names? If so cap ok. If not &#x201C;the&#x201D; should be lowercase?>, the U.S. National Library of Medicine, the Harvard University Archives, Cambridge University Library, Columbia University Library, and F.I.L.M. Archives Inc., New&#x00A0;York.</p>
Thanks,
Dillibabu.
can you show minimal reproducible example of what you are doing?
import xml.etree.ElementTree as ET
import codecs

tree = ET.parse('D:\master_config.xml')
docXMl = ET.parse('D:\Agency_Contingency.xml')
root = tree.getroot()
docRoot = docXMl.getroot()
xx=root.findall(".//addattandbreak")

print(len(xx))
for item in xx:
myXPath="."+item.attrib['xpath'].replace("\"","'")
print(myXPath)
elemt=docRoot.findall(myXPath)
print(len(elemt))
for docItem in elemt:
docItem.set(item.attrib['attname'], item.attrib['attvalue'])
docXMl.write("D:\med-9780199361335_out.xml")
obviously it's not reproducible - we don't have the input files. Also, please fix your indentation
import xml.etree.ElementTree as ET
import time
import codecs
now = time.time()
tree = ET.parse('D:\master_config.xml')
docXMl = ET.parse('D:\Agency_Contingency.xml')
root = tree.getroot()
docRoot = docXMl.getroot()
xx=root.findall(".//addattandbreak")

print(len(xx))
for item in xx:
    
    myXPath="."+item.attrib['xpath'].replace("\"","'")
    print(myXPath)
    elemt=docRoot.findall(myXPath)
    print(len(elemt))
    for docItem in elemt:
        docItem.set(item.attrib['attname'], item.attrib['attvalue'])  
        

docXMl.write("D:\med-9780199361335_out.xml")
[inline]<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE book SYSTEM "d:\test.dtd">
<book xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink" version="5.0">
<index xml:id="Ind1">
<info>
<title>Index</title>
</info>
<indexdiv>
<info>
<title>A</title>
</info>
<indexentry xml:id="ind-001"><primaryie>Abbott, Greg <link linkend="pageC4.P62">C4.P62</link></primaryie>
</indexentry>
<indexentry xml:id="ind-002"><primaryie>Abraham, John <linkgroup><link linkend="pageC2.P52">C2.P52</link>–<link linkend="pageC2.P57">C2.P57</link></linkgroup>, <linkgroup><link linkend="pageC2.P61">C2.P61</link>–<link linkend="pageC2.P67">C2.P67</link></linkgroup>, <linkgroup><link linkend="pageC3.P52">C3.P52</link>–<link linkend="pageC3.P54">C3.P54</link></linkgroup></primaryie>
</indexentry></indexdiv>
</index>
</book>[/inline]


Output:
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE book SYSTEM "d:\test.dtd"> <book xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink" version="5.0"> <index xml:id="Ind1"> <info> <title>Index</title> </info> <indexdiv> <info> <title>A</title> </info> <indexentry xml:id="ind-001"><primaryie>Abbott, Greg <link linkend="pageC4.P62">C4.P62</link></primaryie> </indexentry> <indexentry xml:id="ind-002"><primaryie>Abraham, John <linkgroup><link linkend="pageC2.P52">C2.P52</link>&#x2013;<link linkend="pageC2.P57">C2.P57</link></linkgroup>, <linkgroup><link linkend="pageC2.P61">C2.P61</link>&#x2013;<link linkend="pageC2.P67">C2.P67</link></linkgroup>, <linkgroup><link linkend="pageC3.P52">C3.P52</link>&#x2013;<link linkend="pageC3.P54">C3.P54</link></linkgroup></primaryie> </indexentry></indexdiv> </index> </book>