Python Forum

Full Version: lxml saves empty tags with None text
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I am modifying an XML file to change , into <,> (&lt;,&gt;). However when I find an empty element, it returns None.
My original XML value would be something like:
Output:
<element SourceRef="List_01" GroupName="Lists"> <body> <p><span class="bold">ListElements</span>Element1 val1, Element2 val2; secondElement List</p> </body> <element>
And my code returns:
Output:
<element SourceRef="List_01" GroupName="Lists"> <body> <p>None<span class="bold">ListElements</span>Element1 val1&lt;,&gt; Element2 val2&lt;;&gt; secondElement List</p> </body> <element>
However my desired output should be:
Output:
<element SourceRef="List_01" GroupName="Lists"> <body> <p><span class="bold">ListElements</span>Element1 val1&lt;,&gt; Element2 val2&lt;;&gt; secondElement List</p> </body> <element>
My code is:
if __name__=='__main__':
    parser = ArgumentParser()
    parser.add_argument('-f',  '--file',  help='Path to the file to process')
    args = parser.parse_args()
    xpath_expr = "//Default:Element[starts-with(SourceRef,'List_') \
    or GroupName='Lists']/descendant::*"
    if args.file:
        #elements = get_element(args.file, xpath_expr,  nsd)
        filepath = Path(args.file)
        elements = []
        if filepath.exists() and filepath.is_file():
            parser = et.parse(str(args.file))
            root = parser.getroot()
            elements = root.xpath(xpath_expr, namespaces=nsd)
        for element in elements:
            print(element)
            if element.text != None or element.tail != None:
                text = str(element.text)
                if element.text == None:
                    print('Empty element: ', element.text)
                text = text.replace(',', '<,>')
                text = text.replace(';', '<;>')
                tail = str(element.tail)
                tail = tail.replace(',', '<,>')
                tail = tail.replace(';', '<;>')
                if text != None:
                    element.text = text
                if tail != None:
                    element.tail = tail
                #element = stringify_children(element)
                #print('Element: ', element,', Text: ',  element.text, '\n', 
                #    'Element: ', element, ', Tail: ', element.tail)
                tree = et.ElementTree(root)
                print(tree)
            if element.text == None or element.tail == None:
                element.clear()
                tree = et.ElementTree(root)
            tree.write(args.file,  pretty_print=True)
Any ideas why do I get a None for the empty tags <p>? And how can I prevent it.
Thank you,