Python Forum

Full Version: Parsing Xml files >3gb using lxml iterparse
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I am not able to parse XML file of huge size using lxml tree. What I came to know from my research is that lxml iterparse loads the xml file until it gets tag which it is looking for. This is snippet of my code :-

for event, child in etree.iterparse(xml_file,tag='test'):
        print(sys.getsizeof(child))
It is not even reaching print statement and is getting killed. I am running this on server. Any help on this matter?
Please show enough of a a code snippet so it can be run.
attach a small sample of the XML file.
for event, x in etree.iterparse(file,tag='status'):
		dict.update({'script_exec_end_time':x.attrib['endtime']})
		dict.update({'script_exec_start_time':x.attrib['starttime']})
		dict.update({'script_exec_duration':str((datetime.strptime(str(x.attrib['endtime']),'%Y%m%d %H:%M:%S.%f') - datetime.strptime(str(x.attrib['starttime']),'%Y%m%d %H:%M:%S.%f')).total_seconds())})
		dict.update({'script_result':x.attrib['status']})
		x.clear()