Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 XML Parsing - Find a specific text (ElementTree)
#1
Hi all,

I have a problem to parse a specific text from a xml file.

What I need is the Value (BB001234) of the IDTAG but I didn't know how to grab them.

Here is my full .xml file and my python code.
The problem is that the quantity of the "<DataPoint> ### </DataPoint>" can change.

Hopefully someone can help me.

Thank you
TeraX

import os
from xml.etree import ElementTree

file_name = 'cumulus.xml'
full_file = os.path.abspath(os.path.join('data', file_name))
dom = ElementTree.parse(full_file)

assy = dom.findall('WorkOrders/CumulusWorkOrder/Assembly')

for c in assy:
    item = c.find('PartNumber').text
    serial = c.find('SerialLotNumber').text
    desc = c.find('Description').text.encode('utf-8')
    # idtag = c.find('IDTAG').text

    #print(' * {} - {} - {} - {}'.format(
    #    item, serial, desc, idtag
    #))
    print(' * {} - {} - {} - '.format(
        item, serial, desc
    ))
results:

Output:
$ python 1.py * 1234567 - 1234567.abcdef - Item Description -
cumulus.xml
Quote
#2
You can use lxml:
from lxml import etree
import os


os.chdir(os.path.dirname(__file__))
tree = etree.parse('cumulus.xml')
# print(etree.tostring(tree))
elementPath ='/CumulusWorkOrderGroup/WorkOrders/CumulusWorkOrder/Assembly/DataPoints/DataPoint/Value'
element = tree.xpath(elementPath)
print(element[22].text.strip())
output:
Output:
BB001234
If you play with is a bit, you can get a better path (it's the 22nd DataPoint), that's why the 22 index here:
print(element[22].text.strip())
Note that you can iterate though 'element' if you don't know what the index is:
for n, item in enumerate(element):
    print(f'{n}: {etree.tostring(item)}')
output:
Output:
0: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"/>\n' 1: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"/>\n' 2: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"/>\n' 3: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"/>\n' 4: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">No</Value>\n' 5: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"/>\n' 6: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"/>\n' 7: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"/>\n' 8: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"/>\n' 9: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">Optiklot</Value>\n' 10: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">No</Value>\n' 11: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"/>\n' 12: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"/>\n' 13: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"/>\n' 14: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"/>\n' 15: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">No</Value>\n' 16: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">No</Value>\n' 17: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">No</Value>\n' 18: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"/>\n' 19: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"/>\n' 20: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"/>\n' 21: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"/>\n' 22: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">BB001234</Value>\n' 23: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">True</Value>\n'
Quote
#3
The parsers in stand library is not the best,better of using lxml as Larz60+ show or BeautifulSoup with lxml as chosen parser.
from bs4 import BeautifulSoup

soup = BeautifulSoup(open("cumulus.xml"), 'lxml')
id_tag = soup.find("measurement", string="IDTAG")
print(id_tag.find_next_sibling().text)
Output:
BB001234
Quote
#4
Thanks to both of you!
This is really helpfull.

Best Regards
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  Web crawler extracting specific text from HTML lewdow 1 614 Jan-03-2020, 11:21 PM
Last Post: snippsat
  Help on parsing simple text on HTML amaumox 5 233 Jan-03-2020, 05:50 PM
Last Post: amaumox
  Why doesn't my spider find body text? sigalizer 5 1,311 Oct-30-2019, 11:35 PM
Last Post: sigalizer
  Getting a specific text inside an html with soup mathieugrimbert 9 3,143 Jul-10-2019, 12:40 PM
Last Post: mathieugrimbert
  ElementTree kkrish 2 547 Apr-27-2019, 01:36 AM
Last Post: kkrish
  [split] How to find a specific word in a webpage and How to count it. marpop 2 645 Mar-12-2019, 08:25 AM
Last Post: snippsat
  How to find particular text from td tag using bs4 Prince_Bhatia 7 1,239 Sep-24-2018, 08:36 PM
Last Post: nilamo
  webscraping - failing to extract specific text from data.gov rontar 2 765 May-19-2018, 08:01 AM
Last Post: rontar
  BS4 Not Able To Find Text In CSS Comments digitalmatic7 4 1,770 Feb-27-2018, 03:45 AM
Last Post: digitalmatic7
  How to find a specific word in a webpage and How to count it. pratheep 11 21,552 Feb-08-2018, 04:07 PM
Last Post: pratheep

Forum Jump:


Users browsing this thread: 1 Guest(s)