Python Forum
Parse data from xml file - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Parse data from xml file (/thread-18912.html)



Parse data from xml file - klllmmm - Jun-06-2019

I'm trying to parse data from a xml file downloaded from https://www.treasury.gov/ofac/downloads/consolidated/consolidated.xml

Sample of the xml file is attached.

I tried to parse the data, but it was not successful. Output i'm getting for "firstname" is an empty list

Appreciate if some can help on this.


import xml.etree.ElementTree as ET

file = ET.parse(r'D:\path\to\file\test.xml')

for node in file.getroot():
    print(node)
    firstname = node.findall('firstName')
    print(firstname)
Output:
<Element '{http://tempuri.org/sdnList.xsd}publshInformation' at 0x000001A18F7F8098> [] <Element '{http://tempuri.org/sdnList.xsd}sdnEntry' at 0x000001A18F8091D8> [] <Element '{http://tempuri.org/sdnList.xsd}sdnEntry' at 0x000001A18F809E08> []



RE: Parse data from xml file - heiner55 - Jun-06-2019

#!/usr/bin/python3
import xml.etree.ElementTree as ET
.
file = ET.parse(r'test.xml')
.
for node in file.getroot():
    print(node)
    firstname = node.findall('{http://tempuri.org/sdnList.xsd}firstName')
    print(firstname)
Output:
<Element '{http://tempuri.org/sdnList.xsd}publshInformation' at 0xb74ffd24> [] <Element '{http://tempuri.org/sdnList.xsd}sdnEntry' at 0xb74ffdc4> [<Element '{http://tempuri.org/sdnList.xsd}firstName' at 0xb74ffe14>] <Element '{http://tempuri.org/sdnList.xsd}sdnEntry' at 0xb7502554> [<Element '{http://tempuri.org/sdnList.xsd}firstName' at 0xb75025a4>] ... ... ...



RE: Parse data from xml file - klllmmm - Jun-07-2019

(Jun-06-2019, 04:41 AM)heiner55 Wrote:
#!/usr/bin/python3
import xml.etree.ElementTree as ET
.
file = ET.parse(r'test.xml')
.
for node in file.getroot():
    print(node)
    firstname = node.findall('{http://tempuri.org/sdnList.xsd}firstName')
    print(firstname)
Output:
<Element '{http://tempuri.org/sdnList.xsd}publshInformation' at 0xb74ffd24> [] <Element '{http://tempuri.org/sdnList.xsd}sdnEntry' at 0xb74ffdc4> [<Element '{http://tempuri.org/sdnList.xsd}firstName' at 0xb74ffe14>] <Element '{http://tempuri.org/sdnList.xsd}sdnEntry' at 0xb7502554> [<Element '{http://tempuri.org/sdnList.xsd}firstName' at 0xb75025a4>] ... ... ...


Thanks for the answer.

Using the way suggested I manage to parse some data.

import pandas as pd
import xml.etree.ElementTree as ET

file = ET.parse(r'test.xml')

# Create an emplty dataframe
Data_columns=['uid','firstName','lastName','sdnType']
table = pd.DataFrame(columns=Data_columns)
table = pd.DataFrame()

for node in file.getroot():
    uid= [uid.text for uid in node.findall('{http://tempuri.org/sdnList.xsd}uid')]
    firstname= [firstname.text for firstname in node.findall('{http://tempuri.org/sdnList.xsd}firstName')]
    lastName= [lastName.text for lastName in node.findall('{http://tempuri.org/sdnList.xsd}lastName')]
    sdnType= [sdnType.text for sdnType in node.findall('{http://tempuri.org/sdnList.xsd}sdnType')]
    table_List =[[uid,firstname,lastName,sdnType]]
    table1 = pd.DataFrame(table_List,columns=Data_columns)
    table = table.append(table1,ignore_index=True)

print(table)
Output:
Out[37]: uid firstName lastName sdnType 0 [] [] [] [] 1 [9639] [Ismail Abdul Salah] [HANIYA] [Individual] 2 [26182] [Evren] [KAYAKIRAN] [Individual]
How can i get the values with out brackets?

Appreciate if someone can help on this


RE: Parse data from xml file - heiner55 - Jun-07-2019

Because it is an array:

uid == array
uid[0] ==> first element of array
uid[1] ==> second element


RE: Parse data from xml file - klllmmm - Jun-07-2019

(Jun-07-2019, 04:22 PM)heiner55 Wrote: Because it is an array:

uid == array
uid[0] ==> first element of array
uid[1] ==> second element

Thanks for the answer.

I'm not sure how get an element like uid[0] as sometimes it is an empty array like [].

Appreciate someone can indicate how to make the value in array into a string.


RE: Parse data from xml file - heiner55 - Jun-07-2019

if uid == []:
    name = "none"
else:
    name = uid[0]
or

name = uid[0] if uid != [] else "none"



RE: Parse data from xml file - klllmmm - Jun-08-2019

(Jun-07-2019, 05:17 PM)heiner55 Wrote: uid[0] if uid != [] else "none"

Thanks for the answer

I have adjusted my code accordingly

import pandas as pd
import xml.etree.ElementTree as ET

file = ET.parse(r'test.xml')

# Create an emplty dataframe
Data_columns=['uid','firstName','lastName','sdnType']
table = pd.DataFrame(columns=Data_columns)
table = pd.DataFrame()

for node in file.getroot():
    uid= [uid.text for uid in node.findall('{http://tempuri.org/sdnList.xsd}uid')]
    firstname= [firstname.text for firstname in node.findall('{http://tempuri.org/sdnList.xsd}firstName')]
    lastName= [lastName.text for lastName in node.findall('{http://tempuri.org/sdnList.xsd}lastName')]
    sdnType= [sdnType.text for sdnType in node.findall('{http://tempuri.org/sdnList.xsd}sdnType')]
    table_List =[[uid[0] if uid != [] else '',firstname[0] if firstname != [] else '',lastName[0] if lastName != [] else '',sdnType[0] if sdnType != [] else '']]
    table1 = pd.DataFrame(table_List,columns=Data_columns)
    table = table.append(table1,ignore_index=True)

print(table)
Output:
Out[52]: uid firstName lastName sdnType 0 1 9639 Ismail Abdul Salah HANIYA Individual 2 26182 Evren KAYAKIRAN Individual



RE: Parse data from xml file - heiner55 - Jun-08-2019

Now it looks better.


RE: Parse data from xml file - klllmmm - Jun-19-2019

I tried to parse value in "programList/program"
<ns0:programList>
<ns0:program>FSE-IR</ns0:program>
</ns0:programList>
1. I managed to get value using follwing code. But is there a better way to get this?
1st try
for node in file.getroot():
    for programList in node.findall('{http://tempuri.org/sdnList.xsd}programList'):
        for program in programList.findall('{http://tempuri.org/sdnList.xsd}program'):
            print(program.text)
2nd try
def cleanaa(a):
    cleana = a[0] if a != [] else ''
    return cleana 

for node in file.getroot():
    programList1 = cleanaa([[program.text for program in programList.findall('{http://tempuri.org/sdnList.xsd}program')] for programList in node.findall('{http://tempuri.org/sdnList.xsd}programList')])
    print(programList1)
The second output seems more appropriate as it creates a list and gets multiple values if there are many(maximum there can be two values) for each iteration.
Eg:

Output:
['UKRAINE-EO13662'] ['SYRIA', 'UKRAINE-EO13662'] ['UKRAINE-EO13662']
2. Since there can be one or two values, can I get the two values into two variables, where if there is only one value the second variable will be an empty one? ('')

Appreciate if you can give some inputs to this.


RE: Parse data from xml file - heiner55 - Jun-25-2019

Maybe this helps:

#!/usr/bin/python3

def cleanaa(a):
    cleana = a[0] if a != [] else ''
    return cleana 

[x, *y] = ['UKRAINE-EO13662']
print(x, cleanaa(y))

[x, *y] = ['SYRIA', 'UKRAINE-EO13662']
print(x, cleanaa(y))