![]() |
Remove tag several xml files - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: Remove tag several xml files (/thread-38231.html) |
Remove tag several xml files - mfernandes - Sep-19-2022 Dear python users, I want to drop the same tag in several xml files in one folder. Here is a sample of one xml file: <?xml version='1.0' encoding='UTF-8'?> <compteRendu xmlns="http://schemas.assemblee-nationale.fr/referentiel"> <uid>CRSANR5L15S2017E1N001</uid> <metadonnees> <day>04 juillet 2017</day> </metadonnees> <contenu> <quantiemes> <journee>Séance du mardi 04 juillet 2017</journee> </quantiemes> <openSession valeur="" id_syceron="981337" sommaire="1" code_parole="" code_style="Présidence" code_grammaire="OUV_SEAN_1_1" id_nomination_op="0" id_nomination_oe="0" id_mandat="PM722798" id_acteur="PA332747" ordre_absolu_seance="1" id_preparation="819540" ordinal_prise="1" valeur_ptsodj="0" nivpoint="1"> <orateurs/> <texte>Présidence de M. François de Rugy</texte> </openSession> </contenu> </compteRendu>Here is my code: path = "sourcedirection" #Source dstpath = "whereIwanttosavenewxmlfiles" #save as XML in different folder for filename in os.listdir(path): if filename.endswith('.xml'): tree = ET.parse(path+"/"+filename) #full path of the XML file with it's name roots = tree.findall("contenu") for root in roots: opensessions = root.findall("openSession") for opensession in opensessions: tree.remove(opensessions) save = dstpath+filename tree.write(save, encoding="Latin-1")Instead of removing the tag, it is added "ns0" in my new xml file. <?xml version='1.0' encoding='Latin-1'?> <ns0:compteRendu xmlns:ns0="http://schemas.assemblee-nationale.fr/referentiel"> <ns0:uid>CRSANR5L15S2017E1N001</ns0:uid> <ns0:metadonnees> <ns0:day>04 juillet 2017</ns0:day> </ns0:metadonnees> <ns0:contenu> <ns0:quantiemes> <ns0:journee>Séance du mardi 04 juillet 2017</ns0:journee> </ns0:quantiemes> <ns0:openSession valeur="" id_syceron="981337" sommaire="1" code_parole="" code_style="Présidence" code_grammaire="OUV_SEAN_1_1" id_nomination_op="0" id_nomination_oe="0" id_mandat="PM722798" id_acteur="PA332747" ordre_absolu_seance="1" id_preparation="819540" ordinal_prise="1" valeur_ptsodj="0" nivpoint="1"> <ns0:orateurs /> <ns0:texte>Présidence de M. François de Rugy</ns0:texte> </ns0:openSession> </ns0:contenu> </ns0:compteRendu>What am I doing wrong? RE: Remove tag several xml files - Larz60+ - Sep-19-2022 This may be of interest: https://stackoverflow.com/a/4681377 RE: Remove tag several xml files - mfernandes - Sep-19-2022 Thank you for your suggestion. I forgot to mention that I want to remove the tag and the respective text. In the link that you mentioned, they want to remove the tag but not the text. RE: Remove tag several xml files - deanhystad - Sep-19-2022 You should go to the package web page. You can remove just the tags or remove the tags and the content. https://lxml.de/apidoc/lxml.html.clean.html RE: Remove tag several xml files - mfernandes - Sep-19-2022 Thank you for your suggestions. Here is a code that worked: import lxml from lxml.html.clean import Cleaner for filename in os.listdir(path): if filename.endswith('.xml'): tree = etree.parse(path+"/"+filename) etree.strip_elements(tree, "{*}openSession", with_tail=True) save = dstpath+filename tree.write(save) RE: Remove tag several xml files - Pedroski55 - Sep-20-2022 Bonjour Mesdames et Messieurs! I am interested in this thread, but re and me are not good friends! This works, but, how to do this without removing the newline characters?? Tried using re.MULTILINE couldn't get it to work. import re path2xml = '/home/pedro/myPython/xml/document2.xml' with open(path2xml) as dd: data = dd.read() # newline characters cause trouble for re.search and re.sub # easiest is replace them newdata = re.sub('\n', 'XYZ', data) p2get = re.compile(r'<openSession(.*?)</openSession>') removed_stuff = re.sub(p2get, '', newdata) # put back the newline characters result = re.sub('XYZ', '\n', removed_stuff) savepath = '/home/pedro/myPython/xml/' with open(savepath + 'result.xml', 'w') as r: r.write(result)
|