Python Forum

Hi all,

I have a problem to parse a specific text from a xml file.

What I need is the Value (BB001234) of the IDTAG but I didn't know how to grab them.

Here is my full .xml file and my python code.
The problem is that the quantity of the "<DataPoint> ### </DataPoint>" can change.

Hopefully someone can help me.

Thank you
TeraX

import os
from xml.etree import ElementTree

file_name = 'cumulus.xml'
full_file = os.path.abspath(os.path.join('data', file_name))
dom = ElementTree.parse(full_file)

assy = dom.findall('WorkOrders/CumulusWorkOrder/Assembly')

for c in assy:
    item = c.find('PartNumber').text
    serial = c.find('SerialLotNumber').text
    desc = c.find('Description').text.encode('utf-8')
    # idtag = c.find('IDTAG').text

    #print(' * {} - {} - {} - {}'.format(
    #    item, serial, desc, idtag
    #))
    print(' * {} - {} - {} - '.format(
        item, serial, desc
    ))

results:

Output:$ python 1.py
 * 1234567 - 1234567.abcdef - Item Description -

cumulus.xml

Hide/Show

You can use lxml:

from lxml import etree
import os


os.chdir(os.path.dirname(__file__))
tree = etree.parse('cumulus.xml')
# print(etree.tostring(tree))
elementPath ='/CumulusWorkOrderGroup/WorkOrders/CumulusWorkOrder/Assembly/DataPoints/DataPoint/Value'
element = tree.xpath(elementPath)
print(element[22].text.strip())

output:

Output:
BB001234

If you play with is a bit, you can get a better path (it's the 22nd DataPoint), that's why the 22 index here:

print(element[22].text.strip())

Note that you can iterate though 'element' if you don't know what the index is:

for n, item in enumerate(element):
    print(f'{n}: {etree.tostring(item)}')

output:

Output:0: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"/>\n'
1: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"/>\n'
2: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"/>\n'
3: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"/>\n'
4: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">No</Value>\n'
5: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"/>\n'
6: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"/>\n'
7: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"/>\n'
8: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"/>\n'
9: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">Optiklot</Value>\n'
10: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">No</Value>\n'
11: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"/>\n'
12: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"/>\n'
13: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"/>\n'
14: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"/>\n'
15: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">No</Value>\n'
16: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">No</Value>\n'
17: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">No</Value>\n'
18: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"/>\n'
19: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"/>\n'
20: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"/>\n'
21: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"/>\n'
22: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">BB001234</Value>\n'
23: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">True</Value>\n'

The parsers in stand library is not the best,better of using lxml as Larz60+ show or BeautifulSoup with lxml as chosen parser.

from bs4 import BeautifulSoup

soup = BeautifulSoup(open("cumulus.xml"), 'lxml')
id_tag = soup.find("measurement", string="IDTAG")
print(id_tag.find_next_sibling().text)

Output:
BB001234

Thanks to both of you!
This is really helpfull.

Best Regards

TeraX

Larz60+

snippsat

TeraX