Python Forum

Full Version: Adding a line number to an lxml Element
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I am trying to find a way to add a line number from the original XML text to each element that is built in an XML tree from python's lxml modules. Something that would allow me to get an element from the tree and simply do something to the effect of:

line_num = element.xml_line_num
I know there is a sourceline property in lxml tree elements, but that is not reliable. It is not always correct. With the basic ElementTree provided by default with python 2.7, the default parser had a method called GetInputContext(), which I could use by looping through lines in the original XML text and comparing against the text gotten from that input context method. I quickly discovered CurrentLineNumber from the default parser is not reliable. Had to resort to XML source string comparisons.

Anyhow, I simply am not good enough with my understanding of subclassing parts of modules and getting things working. Matured and widely used Python modules tend to be very thorough and extensive in its use of so many aspects of Python, and when you get down to the point that you are dealing with c-code implementations of the underlying base classes it gets confusing. So, some help, and maybe some example code, would be really appreciated. Thanks in advance!

Let me add some clarification. I would like to have code that looks like this:

from lxml import etree as ET


xml_tree = ET.fromstringlist(xml_file_lines)

for xml_element in xml_tree.iter():

    line_num = xml_element.xml_line_num
'xml_file_lines' is a list of the XML file lines read in from a file. Some of which are empty lines. What I can say about the XML lines in the files I am processing is that non-whitespace lines and lines that aren't comments do show as either opening, closing, self-closing, or open/closing elements. That is to say like this:

<tag>text</tag>
<tag1 />
<tag2 id = "yum" />
<tag2 id = "delicious" name = "tasty">text</tag>
<tag3>
<tag4>hungry</tag4>
<tag3>

There will never be more than one opening or one closing XML tag on any single line in these XML files. Any ideas?