Python Forum

Full Version: XML using xml.etree.ElementTree Question
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Please advise me on the following issue. I have 3 sections below: data, Python code, and output. It looks like the event start does not always give me the elem.text value. In this case, it will give me “None”. The start event is not 100% reliable to obtain a tag value. The end event seems to provide value each time. Is this a bug in Python programming language? Is there a better way to get tag value from start even?

File name: XMLD.DOCUMT.xml

Line 148:
<didFundLendSecurities>N</didFundLendSecurities>

Line 322:
<didFundLendSecurities>N</didFundLendSecurities>

Line 495:
<didFundLendSecurities>N</didFundLendSecurities>

--

import xml.etree.cElementTree as ET
import random
import sys
import os
import collections

v_file_name = "XMLD.DOCUMT.xml"

tree = ET.iterparse(v_file_name, events = ("start", "end"))
children = iter(tree)
for event, elem in children:
v_elem_tag = str(elem.tag.split('}')[1])
v_elem_tag_attrib = str(elem.attrib)
#Tags withoug attrib go here
if (len(v_elem_tag_attrib) == 2):
if (event == "start") and (v_elem_tag == "didFundLendSecurities"):
print("Start: ", v_elem_tag, elem.text)
if (event == "end") and (v_elem_tag == "didFundLendSecurities"):
print("End: ", v_elem_tag, elem.text)
#Tags with attrib go here
else:
pass

--

Start: didFundLendSecurities N
End: didFundLendSecurities N

Start: didFundLendSecurities None
End: didFundLendSecurities N

Start: didFundLendSecurities N
End: didFundLendSecurities N
It is not a bug: the documentation says
Quote:Note iterparse() only guarantees that it has seen the “>” character of a starting tag when it emits a “start” event, so the attributes are defined, but the contents of the text and tail attributes are undefined at that point. The same applies to the element children; they may or may not be present.
If you need a fully populated element, look for “end” events instead.

Obviously, the reason is that the text has not yet been read when the iterative parser emits the start event.