Python Forum

Full Version: Overwrite values in XML file with values from another XML file
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I have one main xml file (Mainfile_1.xml) where some items show value = 'FAIL'. I want to replace those Fail values with correct values from another XML file (Fixfile_1.xml). It should look like on the picture below:

[Image: tYIVy.png]

So as you can see, values from Fixfile_1.xml should replace 'FAIL' values in Mainfile_1.xml for coresponding Item name and object id.

So far I wrote a code where I read both xml files and print only data related with Fail values. My MAIN PROBLEM is how to save it to a file so the failed values would be overwriten by values from fixfile_1.xml. "Tree.write" only delete "<?xml version='1.0' encoding='UTF-8'?>" line for some reason.

There is my code:

 import xml.etree.ElementTree as ET
    
    Mainfile = 'Mainfile_1.xml'
    tree = ET.parse(Mainfile)
    root = tree.getroot()
    fixfile = 'fixfile_1.xml'
    tree2 = ET.parse(fixfile)
    root2 = tree2.getroot()
    for objects in root.iter('object'):
        objabsno = objects.attrib.get('absno')
        for attributes in objects.getchildren():
            name = attributes.attrib.get('name')
            value = attributes.attrib.get('value')
            if value == 'FAIL':
                for objects2 in root2.iter('object'):
                    objabsno2 = objects2.attrib.get('absno')
                    for attributes2 in objects2.getchildren():
                        name2 = attributes2.attrib.get('name')
                        value2 = attributes2.attrib.get('value')
                        if objabsno2 == objabsno:
                            if name == name2:
                                print(name,name2,value,value2)
    tree.write('newMainfile_1.xml')
There is Mainfile_1.xml

    <?xml version='1.0' encoding='UTF-8'?>
    <Module bs='Mainfile_1'>
    <object name='namex' number='1' id='1000'>
        <item name='item0' value='100'/>
        <item name='item00' value='100'/>
    </object>
    <object name='namey' number='2' id='1001'>
        <item name='item1' value='100'/>
        <item name='item00' value='100'/>
    </object>
    <object name='name1' number='3' id='1234'>
        <item name='item1' value='FAIL'/>
        <item name='item2' value='233'/>
        <item name='item3' value='233'/>
        <item name='item4' value='FAIL'/>
    </object>
    <object name='name2' number='4' id='1238'>
        <item name='item8' value='FAIL'/>
        <item name='item9' value='233'/>
    </object>
    <object name='name32' number='5' id='2345'>
        <item name='item1' value='111'/>
        <item name='item2' value='FAIL'/>
    </object>
    <object name='name4' number='6' id='2347'>
        <item name='item1' value='FAIL'/>
        <item name='item2' value='FAIL'/>
        <item name='item3' value='233'/>
        <item name='item4' value='FAIL'/>
    </object>
    </Module>
And there is Fixfile_1.xml

    <?xml version='1.0' encoding='UTF-8'?>
    <Module bs='Mainfile_1'>
    <object id='1234'>
        <item name='item1' value='something
    more of something'/>
        <item name='item4' value='something
    more of something'/>
    </object>
    <object id='1238'>
        <item name='item8' value='something12
    more of something'/>
    </object>
    <object id='2345'>
        <item name='item2' value='something
    more of something'/>
    </object>
    <object id='2347'>
        <item name='item1' value='something14
    more of something'/>
        <item name='item2' value='something
    more of something'/>
        <item name='item4' value='something14
    something14
    something12
    more of something'/>
    </object>
    </Module>
And there is one more thing!! Because I have a lot of coresponding files like that (Mainfile_1.xml - Fixfile_1.xml, Mainfile_2.xml - Fixfile_2.xml,Mainfile_3.xml - Fixfile_3.xml, etc.) is there a way to open and overwrite them all at once?
I'm working on something that will help.
it'll take a while, but should be done on my tomorrow (EDT)

I'll be back
(Mar-31-2022, 02:40 AM)Larz60+ Wrote: [ -> ]I'm working on something that will help.
it'll take a while, but should be done on my tomorrow (EDT)

I'll be back

Ok, thank you. I really appreciate any help
There are two access methods shown below:
  1. process_using_defusedxml this uses an etree, but not xml.etree.ElementTree which is very unsafe, venerable to attacks
    Quote:Note XML is not safe, see: https://docs.python.org/3/library/xml.ht...rabilities use defusedxml instead install with pip: 'pip install defusedxml see GitHub: https://github.com/tiran/defusedxml

  2. process_using_bs4 this is (my) preferred method, and as far as I know safe. It uses BeautifulSoup4 to parse the input.

Using the second method, you can be rearrange into a class with appropriate update methods

from pathlib import Path
import os

def process_using_defusedxml(filename):
    import defusedxml.ElementTree as ET

    def tree_walk(root, level=0):
        indent = " " * (4 * level)
        for child in root:
            print(f"\n{indent}Type(child): {type(child)}")
            print(f"\n{indent}tag: {child.tag}")
            print(f"    {indent}attribute: {child.attrib}")
            print(f"    {indent}text: {child.text}")
            level += 1
            tree_walk(child)

    tree = ET.parse(filename)
    root = tree.getroot()

    tree_walk(root)
    
# alternative method using Beautiful Soup
def process_using_bs4(filename):
    from bs4 import BeautifulSoup

    with filename.open('r') as fp:        
        xmldata = fp.read()
        soup = BeautifulSoup(xmldata, 'lxml')
        module = soup.find('module')
        modulename = module.get('bs')
        print(f"Module Name: {modulename}")

        objects = soup.find_all('object')
        print(f"\nobjects:")
        for n, obj in enumerate(objects):
            print(f"\nobject_id: {obj.get('id')} object name: {obj.get('name')}" \
                f" object number: {obj.get('number')}")
            items = obj.find_all('item')
            if items:
                print()
                for n1, item in enumerate(items):
                    if item:
                        print(f"    item number: {n1} name: {item.get('name')} " \
                            f"value: {item.get('value')}")

os.chdir(os.path.abspath(os.path.dirname(__file__)))
filename = Path('.') / 'Mainfile_1.xml'

# process_using_defusedxml(filename)
process_using_bs4(filename)
BeautifulSoup4 (bs4) method results:
Output:
Module Name: Mainfile_1 objects: object_id: 1000 object name: namex object number: 1 item number: 0 name: item0 value: 100 item number: 1 name: item00 value: 100 object_id: 1001 object name: namey object number: 2 item number: 0 name: item1 value: 100 item number: 1 name: item00 value: 100 object_id: 1234 object name: name1 object number: 3 item number: 0 name: item1 value: FAIL item number: 1 name: item2 value: 233 item number: 2 name: item3 value: 233 item number: 3 name: item4 value: FAIL object_id: 1238 object name: name2 object number: 4 item number: 0 name: item8 value: FAIL item number: 1 name: item9 value: 233 object_id: 2345 object name: name32 object number: 5 item number: 0 name: item1 value: 111 item number: 1 name: item2 value: FAIL object_id: 2347 object name: name4 object number: 6 item number: 0 name: item1 value: FAIL item number: 1 name: item2 value: FAIL item number: 2 name: item3 value: 233 item number: 3 name: item4 value: FAIL
This should give you something to work with.
(Apr-01-2022, 08:27 AM)Larz60+ Wrote: [ -> ]There are two access methods shown below:
  1. process_using_defusedxml this uses an etree, but not xml.etree.ElementTree which is very unsafe, venerable to attacks
    Quote:Note XML is not safe, see: https://docs.python.org/3/library/xml.ht...rabilities use defusedxml instead install with pip: 'pip install defusedxml see GitHub: https://github.com/tiran/defusedxml

    This should give you something to work with.


Thank you very much, luckily I managed to find solution by my own but new problems occured. what I showed in my another topic.
Keep in mind that etree.ElementTree is very unsafe, venerable to many attacks.