Python Forum
lxml saves empty tags with None text
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
lxml saves empty tags with None text
#1
I am modifying an XML file to change , into <,> (&lt;,&gt;). However when I find an empty element, it returns None.
My original XML value would be something like:
Output:
<element SourceRef="List_01" GroupName="Lists"> <body> <p><span class="bold">ListElements</span>Element1 val1, Element2 val2; secondElement List</p> </body> <element>
And my code returns:
Output:
<element SourceRef="List_01" GroupName="Lists"> <body> <p>None<span class="bold">ListElements</span>Element1 val1&lt;,&gt; Element2 val2&lt;;&gt; secondElement List</p> </body> <element>
However my desired output should be:
Output:
<element SourceRef="List_01" GroupName="Lists"> <body> <p><span class="bold">ListElements</span>Element1 val1&lt;,&gt; Element2 val2&lt;;&gt; secondElement List</p> </body> <element>
My code is:
if __name__=='__main__':
    parser = ArgumentParser()
    parser.add_argument('-f',  '--file',  help='Path to the file to process')
    args = parser.parse_args()
    xpath_expr = "//Default:Element[starts-with(SourceRef,'List_') \
    or GroupName='Lists']/descendant::*"
    if args.file:
        #elements = get_element(args.file, xpath_expr,  nsd)
        filepath = Path(args.file)
        elements = []
        if filepath.exists() and filepath.is_file():
            parser = et.parse(str(args.file))
            root = parser.getroot()
            elements = root.xpath(xpath_expr, namespaces=nsd)
        for element in elements:
            print(element)
            if element.text != None or element.tail != None:
                text = str(element.text)
                if element.text == None:
                    print('Empty element: ', element.text)
                text = text.replace(',', '<,>')
                text = text.replace(';', '<;>')
                tail = str(element.tail)
                tail = tail.replace(',', '<,>')
                tail = tail.replace(';', '<;>')
                if text != None:
                    element.text = text
                if tail != None:
                    element.tail = tail
                #element = stringify_children(element)
                #print('Element: ', element,', Text: ',  element.text, '\n', 
                #    'Element: ', element, ', Tail: ', element.tail)
                tree = et.ElementTree(root)
                print(tree)
            if element.text == None or element.tail == None:
                element.clear()
                tree = et.ElementTree(root)
            tree.write(args.file,  pretty_print=True)
Any ideas why do I get a None for the empty tags <p>? And how can I prevent it.
Thank you,
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Delete empty text files [SOLVED] AlphaInc 5 1,546 Jul-09-2022, 02:15 PM
Last Post: DeaD_EyE
  How to find tags using specific text (timestamps) in a url? q988988 1 1,369 Mar-08-2022, 08:09 AM
Last Post: buran
  Get text from within h3 html tags Pedroski55 8 4,276 Jan-05-2022, 06:50 AM
Last Post: Larz60+
  Saves the data in the wrong format Hennie 1 2,118 Jan-01-2020, 03:40 PM
Last Post: sandeep_ganga
  Sublime text 3 and lxml DreamingInsanity 2 4,237 Dec-22-2019, 01:49 PM
Last Post: DreamingInsanity
  Loop through tags inside tags in Selenium/Python xpack24 1 5,673 Oct-23-2019, 10:15 AM
Last Post: Larz60+
  lxml - etree/lxml need help storing variable for most inserted element goeb 0 2,554 Apr-01-2019, 03:09 AM
Last Post: goeb
  Remove Empty tags in XML using plain python without lxml library saurabhverma2412 3 9,405 Aug-21-2018, 04:53 PM
Last Post: saurabhverma2412

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020