Python Forum
Write the XML file from elementtree with hexa decimal encoding
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Write the XML file from elementtree with hexa decimal encoding
#1
Hi,

I am new to python development. I am using the Elementtree to parse and manipulate the XML file which contains the special characters with hexadecimal encoding. After the manipulation, I want to write the XML with the same ending format. When I try to write the XML the special characters are printed as decimal numbers by default. Is there any option to write the XML with special characters as hexadecimal values? like UTF-8, ASCII encoding method. Please suggest.

Output:
<p aid:pstyle="TextInd" >In preparing the illustrations for this book, I&#x00A0;have relied on the generosity of private archive owners Aleksandr Lavrent&#x2019;ev, Natalia Galadzheva, Sophia Bogatyreva, Elena Radkovskaia, and Aleksandra Radkovskaia, as well as the helpful, considerate service of the staff at the Russian State Archive of Literature and the Arts (RGALI), The State Museum of V.V. Mayakovsky, The State Central Film Museum in Moscow<?AQ AQ:&#x00A0;Is &#x201C;The&#x201D; part of these names? If so cap ok. If not &#x201C;the&#x201D; should be lowercase?>, the U.S. National Library of Medicine, the Harvard University Archives, Cambridge University Library, Columbia University Library, and F.I.L.M. Archives Inc., New&#x00A0;York.</p>
Thanks,
Dillibabu.
Reply
#2
can you show minimal reproducible example of what you are doing?
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#3
import xml.etree.ElementTree as ET
import codecs

tree = ET.parse('D:\master_config.xml')
docXMl = ET.parse('D:\Agency_Contingency.xml')
root = tree.getroot()
docRoot = docXMl.getroot()
xx=root.findall(".//addattandbreak")

print(len(xx))
for item in xx:
myXPath="."+item.attrib['xpath'].replace("\"","'")
print(myXPath)
elemt=docRoot.findall(myXPath)
print(len(elemt))
for docItem in elemt:
docItem.set(item.attrib['attname'], item.attrib['attvalue'])
docXMl.write("D:\med-9780199361335_out.xml")
Reply
#4
obviously it's not reproducible - we don't have the input files. Also, please fix your indentation
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#5
import xml.etree.ElementTree as ET
import time
import codecs
now = time.time()
tree = ET.parse('D:\master_config.xml')
docXMl = ET.parse('D:\Agency_Contingency.xml')
root = tree.getroot()
docRoot = docXMl.getroot()
xx=root.findall(".//addattandbreak")

print(len(xx))
for item in xx:
    
    myXPath="."+item.attrib['xpath'].replace("\"","'")
    print(myXPath)
    elemt=docRoot.findall(myXPath)
    print(len(elemt))
    for docItem in elemt:
        docItem.set(item.attrib['attname'], item.attrib['attvalue'])  
        

docXMl.write("D:\med-9780199361335_out.xml")
[inline]<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE book SYSTEM "d:\test.dtd">
<book xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink" version="5.0">
<index xml:id="Ind1">
<info>
<title>Index</title>
</info>
<indexdiv>
<info>
<title>A</title>
</info>
<indexentry xml:id="ind-001"><primaryie>Abbott, Greg <link linkend="pageC4.P62">C4.P62</link></primaryie>
</indexentry>
<indexentry xml:id="ind-002"><primaryie>Abraham, John <linkgroup><link linkend="pageC2.P52">C2.P52</link>–<link linkend="pageC2.P57">C2.P57</link></linkgroup>, <linkgroup><link linkend="pageC2.P61">C2.P61</link>–<link linkend="pageC2.P67">C2.P67</link></linkgroup>, <linkgroup><link linkend="pageC3.P52">C3.P52</link>–<link linkend="pageC3.P54">C3.P54</link></linkgroup></primaryie>
</indexentry></indexdiv>
</index>
</book>[/inline]


Output:
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE book SYSTEM "d:\test.dtd"> <book xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink" version="5.0"> <index xml:id="Ind1"> <info> <title>Index</title> </info> <indexdiv> <info> <title>A</title> </info> <indexentry xml:id="ind-001"><primaryie>Abbott, Greg <link linkend="pageC4.P62">C4.P62</link></primaryie> </indexentry> <indexentry xml:id="ind-002"><primaryie>Abraham, John <linkgroup><link linkend="pageC2.P52">C2.P52</link>&#x2013;<link linkend="pageC2.P57">C2.P57</link></linkgroup>, <linkgroup><link linkend="pageC2.P61">C2.P61</link>&#x2013;<link linkend="pageC2.P67">C2.P67</link></linkgroup>, <linkgroup><link linkend="pageC3.P52">C3.P52</link>&#x2013;<link linkend="pageC3.P54">C3.P54</link></linkgroup></primaryie> </indexentry></indexdiv> </index> </book>
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Last record in file doesn't write to newline gonksoup 3 364 Jan-22-2024, 12:56 PM
Last Post: deanhystad
  write to csv file problem jacksfrustration 11 1,370 Nov-09-2023, 01:56 PM
Last Post: deanhystad
  python Read each xlsx file and write it into csv with pipe delimiter mg24 4 1,308 Nov-09-2023, 10:56 AM
Last Post: mg24
  How do I read and write a binary file in Python? blackears 6 6,010 Jun-06-2023, 06:37 PM
Last Post: rajeshgk
  Reading data from excel file –> process it >>then write to another excel output file Jennifer_Jone 0 1,046 Mar-14-2023, 07:59 PM
Last Post: Jennifer_Jone
  Read text file, modify it then write back Pavel_47 5 1,499 Feb-18-2023, 02:49 PM
Last Post: deanhystad
  how to read txt file, and write into excel with multiply sheet jacklee26 14 9,513 Jan-21-2023, 06:57 AM
Last Post: jacklee26
  How to write in text file - indented block Joni_Engr 4 6,362 Jul-18-2022, 09:09 AM
Last Post: Hathemand
  [SOLVED] [ElementTree] Grab text in attributes? Winfried 3 1,584 May-27-2022, 04:59 PM
Last Post: Winfried
  [ElementTree] Insert big block of HTML? Winfried 0 1,150 May-12-2022, 07:08 AM
Last Post: Winfried

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020