Python Forum

Full Version: How to import an xml file to Pandas
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello All,
I am a new Pandas user, I am now familiar with this application. I'm fine with importing a CSV file and similar data tables.
However, for a new task, the data would be in xml files, the contents of which would have to be read and then concatenated into a file.
My problem is reading it.
I can't scan all the depths of the root structure.
The code I use is as follows:
# In [1]:
import xml.etree.ElementTree as et
# In [2]:
import pandas as pd
# In [3]:
xml_data = open("C:\\Adatok\\DAC-6\\teszt1.xml", 'r').read()  
# In [4]:
root = et.XML(xml_data) 

data = []
cols = []
for i, child in enumerate(root):
    data.append([subchild.text for subchild in child])
    cols.append(child.tag)
for i, subchild in enumerate(child):
    data.append([subsubchild.text for subsubchild in subchild])
    cols.append(subchild.tag)
for i, subsubchild in enumerate(subchild):
    data.append([subsubsubchild.text for subsubsubchild in subsubchild])
    cols.append(subsubchild.tag)
for i, subsubsubchild in enumerate(subsubchild):
    data.append([subsubsubsubchild.text for subsubsubsubchild in subsubsubchild])
    cols.append(subsubsubchild.tag)
for i, subsubsubsubchild in enumerate(subsubsubchild):
    data.append([subsubsubsubsubchild.text for subsubsubsubsubchild in subsubsubsubchild])
    cols.append(subsubsubsubchild.tag)
for i, subsubsubsubsubchild in enumerate(subsubsubsubchild):
    data.append([subsubsubsubsubsubchild.text for subsubsubsubsubsubchild in subsubsubsubsubchild])
    cols.append(subsubsubsubsubchild.tag)
for i, subsubsubsubsubsubchild in enumerate(subsubsubsubsubchild):
    data.append([subsubsubsubsubsubsubchild.text for subsubsubsubsubsubsubchild in subsubsubsubsubsubchild])
    cols.append(subsubsubsubsubsubchild.tag)
Error:
NameError Traceback (most recent call last) <ipython-input-35-c374e6ba6497> in <module> 21 data.append([subsubsubsubsubsubchild.text for subsubsubsubsubsubchild in subsubsubsubsubchild]) 22 cols.append(subsubsubsubsubchild.tag) ---> 23 for i, subsubsubsubsubsubchild in enumerate(subsubsubsubsubchild): 24 data.append([subsubsubsubsubsubsubchild.text for subsubsubsubsubsubsubchild in subsubsubsubsubsubchild]) 25 cols.append(subsubsubsubsubsubchild.tag) NameError: name 'subsubsubsubsubchild' is not defined
# In [ ]:
df = pd.DataFrame(data).T  
# In [ ]:
df.columns = cols  
# In [ ]:
df.head()
I have attached a sample of the xml file to load


I would be very happy for any help, or if someone would write why I couldn’t read this multi-level root structure, already at that depth.

All the best to everyone