Python Forum

Full Version: xml.etree.ElementTree extract string values
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi all,
I'am new to python. I am trying to parse an xml file and extract values between "><". Here the xml example :

<?xml version="1.0" encoding="UTF-8"?>
<DisplayDefinitionTable>
	<rows>
		<row>
			<object_tag tag="tagstr" uid="uidstr"/>
			<row_element column="0" component_tag="223011" property_name="property1">VALUE_STR_1</row_element>
			<row_element column="1" component_tag="223011" property_name="property2">VALUE_STR_2</row_element>
			<row_element column="2" component_tag="223011" property_name="property3">VALUE_STR_3</row_element>
			<row_element column="3" component_tag="223011" property_name="property4">VALUE_STR_4</row_element>
			<row_element column="4" component_tag="223011" property_name="property5">VALUE_STR_5</row_element>
			<row_element column="5" component_tag="1182129" property_name="property6">VALUE_STR_6</row_element>
			<row_element column="6" component_tag="81988" property_name="property7">VALUE_STR_7</row_element>
			<row_element column="7" component_tag="223011" property_name="property8">VALUE_STR_8</row_element>
			<row_element column="8" component_tag="223011" property_name="property9">VALUE_STR_9</row_element>
			<row_element column="9" component_tag="223011" property_name="property10">VALUE_STR_10</row_element>
		</row>
		
	</rows>
</DisplayDefinitionTable>[python]
[/python]

I'am trying to exrtract the value string for property1 between "><" (VALUE_STR_1) zu extrahieren.
Here the code :

from pathlib import Path
import os
import tempfile
import xml.etree.ElementTree as ET

srcpath = Path(__file__).parent.absolute()
os.chdir(srcpath)

tree = ET.parse("example.xml")
root = tree.iter()
#root = tree.getroot()

value= ""
PropertyName =""
for child in root:
     print(child.tag, child.attrib)
     if child.tag == "row_element":
        #print(child.tag,child.attrib)
        PropertyName=child.attrib.get('property_name')
        print('>>',PropertyName)
        value=child.findtext('PropertyName')
        print ("Value from ",PropertyName,":",value)
Attached the corresponding output :

DisplayDefinitionTable {}
rows {}
row {}
object_tag {'tag': 'tagstr', 'uid': 'uidstr'}
row_element {'column': '0', 'component_tag': '223011', 'property_name': 'property1'} 
>> property1
Value from  property1 : None
I did try various approaches but without success. I am under the impression the root element does not have those values at all. Any help or hint is highly appreciated

Thx
Matthias
You can access text content simply with text attribute:
for child in root:
    print(child.tag, child.attrib)
    if child.tag == "row_element":
        # print(child.tag,child.attrib)
        PropertyName = child.attrib.get('property_name')
        print(f'>> {PropertyName}')
        value = child.text
        print(f'Value from {PropertyName}: {value}')
Here one with BS as i never use ElementTree(has caused a lot of unnecessary problems for many people trough the years).
from bs4 import BeautifulSoup

soup = BeautifulSoup(open('comp.xml'), 'xml')
for row in soup.find_all('row_element'):
    print(row.text)
Output:
VALUE_STR_1 VALUE_STR_2 VALUE_STR_3 VALUE_STR_4 VALUE_STR_5 VALUE_STR_6 VALUE_STR_7 VALUE_STR_8 VALUE_STR_9 VALUE_STR_10