XML Parsing - Find a specific text (ElementTree) - TeraX - Oct-05-2018
Hi all,
I have a problem to parse a specific text from a xml file.
What I need is the Value (BB001234) of the IDTAG but I didn't know how to grab them.
Here is my full .xml file and my python code.
The problem is that the quantity of the "<DataPoint> ### </DataPoint>" can change.
Hopefully someone can help me.
Thank you
TeraX
import os
from xml.etree import ElementTree
file_name = 'cumulus.xml'
full_file = os.path.abspath(os.path.join('data', file_name))
dom = ElementTree.parse(full_file)
assy = dom.findall('WorkOrders/CumulusWorkOrder/Assembly')
for c in assy:
item = c.find('PartNumber').text
serial = c.find('SerialLotNumber').text
desc = c.find('Description').text.encode('utf-8')
# idtag = c.find('IDTAG').text
#print(' * {} - {} - {} - {}'.format(
# item, serial, desc, idtag
#))
print(' * {} - {} - {} - '.format(
item, serial, desc
)) results:
Output: $ python 1.py
* 1234567 - 1234567.abcdef - Item Description -
cumulus.xml
<?xml version="1.0" encoding="utf-8"?>
<CumulusWorkOrderGroup xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<CumulusBatchNumber />
<WorkOrders>
<CumulusWorkOrder>
<Assembly>
<PartNumber>1234567</PartNumber>
<Description>Item Description</Description>
<Revision>AA</Revision>
<SerialLotNumber>1234567.abcdef</SerialLotNumber>
<Attributes>
<Attribute>
<Attribute>Revision</Attribute>
<DataType>Text</DataType>
<Value>AA</Value>
</Attribute>
</Attributes>
<Materials />
<Tags>
<Tag>
<TagName>(ALL)</TagName>
<Member>true</Member>
</Tag>
<Tag>
<TagName>TEST</TagName>
<Member>false</Member>
</Tag>
<Tag>
<TagName>Test</TagName>
<Member>false</Member>
</Tag>
</Tags>
<DataPoints>
<DataPoint>
<Measurement>Operator</Measurement>
<Value />
</DataPoint>
<DataPoint>
<Measurement>Productive time</Measurement>
<Value />
</DataPoint>
<DataPoint>
<Measurement>Dead time</Measurement>
<Value />
</DataPoint>
<DataPoint>
<Measurement>Process Age</Measurement>
<Value />
</DataPoint>
<DataPoint>
<Measurement>Parts picking complete</Measurement>
<Value>No</Value>
</DataPoint>
<DataPoint>
<Measurement>Operator</Measurement>
<Value />
</DataPoint>
<DataPoint>
<Measurement>Productive time</Measurement>
<Value />
</DataPoint>
<DataPoint>
<Measurement>Dead time</Measurement>
<Value />
</DataPoint>
<DataPoint>
<Measurement>Process Age</Measurement>
<Value />
</DataPoint>
<DataPoint>
<Measurement>Enter optic lot number</Measurement>
<Value>Optiklot</Value>
</DataPoint>
<DataPoint>
<Measurement>Optic parts picking complete</Measurement>
<Value>No</Value>
</DataPoint>
<DataPoint>
<Measurement>Operator</Measurement>
<Value />
</DataPoint>
<DataPoint>
<Measurement>Productive time</Measurement>
<Value />
</DataPoint>
<DataPoint>
<Measurement>Dead time</Measurement>
<Value />
</DataPoint>
<DataPoint>
<Measurement>Process Age</Measurement>
<Value />
</DataPoint>
<DataPoint>
<Measurement>Pressure test done</Measurement>
<Value>No</Value>
</DataPoint>
<DataPoint>
<Measurement>Stress test done</Measurement>
<Value>No</Value>
</DataPoint>
<DataPoint>
<Measurement>Optics checked</Measurement>
<Value>No</Value>
</DataPoint>
<DataPoint>
<Measurement>Operator</Measurement>
<Value />
</DataPoint>
<DataPoint>
<Measurement>Productive time</Measurement>
<Value />
</DataPoint>
<DataPoint>
<Measurement>Dead time</Measurement>
<Value />
</DataPoint>
<DataPoint>
<Measurement>Process Age</Measurement>
<Value />
</DataPoint>
<DataPoint>
<Measurement>IDTAG</Measurement>
<Value>BB001234</Value>
</DataPoint>
<DataPoint>
<Measurement>Write Label</Measurement>
<Value>True</Value>
</DataPoint>
</DataPoints>
</Assembly>
<Scheduling>
<Customer />
<SalesOrder />
<ScheduleDate xsi:nil="true" />
<StartDate xsi:nil="true" />
<DueDate xsi:nil="true" />
</Scheduling>
<CumulusWorkOrderNumber>1056453</CumulusWorkOrderNumber>
<OracleWorkOrderNumber />
<QTY>3</QTY>
<WorkOrderType>Engineering</WorkOrderType>
<WorkOrderStatus>Active</WorkOrderStatus>
<Created>2018-05-29T14:35:55.983</Created>
<Completed xsi:nil="true" />
</CumulusWorkOrder>
</WorkOrders>
</CumulusWorkOrderGroup>
RE: XML Parsing - Find a specific text (ElementTree) - Larz60+ - Oct-05-2018
You can use lxml:
from lxml import etree
import os
os.chdir(os.path.dirname(__file__))
tree = etree.parse('cumulus.xml')
# print(etree.tostring(tree))
elementPath ='/CumulusWorkOrderGroup/WorkOrders/CumulusWorkOrder/Assembly/DataPoints/DataPoint/Value'
element = tree.xpath(elementPath)
print(element[22].text.strip()) output:
Output: BB001234
If you play with is a bit, you can get a better path (it's the 22nd DataPoint), that's why the 22 index here:
print(element[22].text.strip()) Note that you can iterate though 'element' if you don't know what the index is:
for n, item in enumerate(element):
print(f'{n}: {etree.tostring(item)}') output:
Output: 0: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"/>\n'
1: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"/>\n'
2: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"/>\n'
3: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"/>\n'
4: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">No</Value>\n'
5: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"/>\n'
6: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"/>\n'
7: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"/>\n'
8: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"/>\n'
9: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">Optiklot</Value>\n'
10: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">No</Value>\n'
11: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"/>\n'
12: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"/>\n'
13: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"/>\n'
14: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"/>\n'
15: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">No</Value>\n'
16: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">No</Value>\n'
17: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">No</Value>\n'
18: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"/>\n'
19: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"/>\n'
20: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"/>\n'
21: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"/>\n'
22: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">BB001234</Value>\n'
23: b'<Value xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">True</Value>\n'
RE: XML Parsing - Find a specific text (ElementTree) - snippsat - Oct-06-2018
The parsers in stand library is not the best,better of using lxml as Larz60+ show or BeautifulSoup with lxml as chosen parser.
from bs4 import BeautifulSoup
soup = BeautifulSoup(open("cumulus.xml"), 'lxml')
id_tag = soup.find("measurement", string="IDTAG")
print(id_tag.find_next_sibling().text) Output: BB001234
RE: XML Parsing - Find a specific text (ElementTree) - TeraX - Oct-09-2018
Thanks to both of you!
This is really helpfull.
Best Regards
|