Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Parse data from xml file
#1
I'm trying to parse data from a xml file downloaded from https://www.treasury.gov/ofac/downloads/...idated.xml

Sample of the xml file is attached.

I tried to parse the data, but it was not successful. Output i'm getting for "firstname" is an empty list

Appreciate if some can help on this.


import xml.etree.ElementTree as ET

file = ET.parse(r'D:\path\to\file\test.xml')

for node in file.getroot():
    print(node)
    firstname = node.findall('firstName')
    print(firstname)
Output:
<Element '{http://tempuri.org/sdnList.xsd}publshInformation' at 0x000001A18F7F8098> [] <Element '{http://tempuri.org/sdnList.xsd}sdnEntry' at 0x000001A18F8091D8> [] <Element '{http://tempuri.org/sdnList.xsd}sdnEntry' at 0x000001A18F809E08> []

Attached Files

.xml   test.xml (Size: 2.37 KB / Downloads: 352)
Reply
#2
#!/usr/bin/python3
import xml.etree.ElementTree as ET
.
file = ET.parse(r'test.xml')
.
for node in file.getroot():
    print(node)
    firstname = node.findall('{http://tempuri.org/sdnList.xsd}firstName')
    print(firstname)
Output:
<Element '{http://tempuri.org/sdnList.xsd}publshInformation' at 0xb74ffd24> [] <Element '{http://tempuri.org/sdnList.xsd}sdnEntry' at 0xb74ffdc4> [<Element '{http://tempuri.org/sdnList.xsd}firstName' at 0xb74ffe14>] <Element '{http://tempuri.org/sdnList.xsd}sdnEntry' at 0xb7502554> [<Element '{http://tempuri.org/sdnList.xsd}firstName' at 0xb75025a4>] ... ... ...
Reply
#3
(Jun-06-2019, 04:41 AM)heiner55 Wrote:
#!/usr/bin/python3
import xml.etree.ElementTree as ET
.
file = ET.parse(r'test.xml')
.
for node in file.getroot():
    print(node)
    firstname = node.findall('{http://tempuri.org/sdnList.xsd}firstName')
    print(firstname)
Output:
<Element '{http://tempuri.org/sdnList.xsd}publshInformation' at 0xb74ffd24> [] <Element '{http://tempuri.org/sdnList.xsd}sdnEntry' at 0xb74ffdc4> [<Element '{http://tempuri.org/sdnList.xsd}firstName' at 0xb74ffe14>] <Element '{http://tempuri.org/sdnList.xsd}sdnEntry' at 0xb7502554> [<Element '{http://tempuri.org/sdnList.xsd}firstName' at 0xb75025a4>] ... ... ...


Thanks for the answer.

Using the way suggested I manage to parse some data.

import pandas as pd
import xml.etree.ElementTree as ET

file = ET.parse(r'test.xml')

# Create an emplty dataframe
Data_columns=['uid','firstName','lastName','sdnType']
table = pd.DataFrame(columns=Data_columns)
table = pd.DataFrame()

for node in file.getroot():
    uid= [uid.text for uid in node.findall('{http://tempuri.org/sdnList.xsd}uid')]
    firstname= [firstname.text for firstname in node.findall('{http://tempuri.org/sdnList.xsd}firstName')]
    lastName= [lastName.text for lastName in node.findall('{http://tempuri.org/sdnList.xsd}lastName')]
    sdnType= [sdnType.text for sdnType in node.findall('{http://tempuri.org/sdnList.xsd}sdnType')]
    table_List =[[uid,firstname,lastName,sdnType]]
    table1 = pd.DataFrame(table_List,columns=Data_columns)
    table = table.append(table1,ignore_index=True)

print(table)
Output:
Out[37]: uid firstName lastName sdnType 0 [] [] [] [] 1 [9639] [Ismail Abdul Salah] [HANIYA] [Individual] 2 [26182] [Evren] [KAYAKIRAN] [Individual]
How can i get the values with out brackets?

Appreciate if someone can help on this
Reply
#4
Because it is an array:

uid == array
uid[0] ==> first element of array
uid[1] ==> second element
Reply
#5
(Jun-07-2019, 04:22 PM)heiner55 Wrote: Because it is an array:

uid == array
uid[0] ==> first element of array
uid[1] ==> second element

Thanks for the answer.

I'm not sure how get an element like uid[0] as sometimes it is an empty array like [].

Appreciate someone can indicate how to make the value in array into a string.
Reply
#6
if uid == []:
    name = "none"
else:
    name = uid[0]
or

name = uid[0] if uid != [] else "none"
Reply
#7
(Jun-07-2019, 05:17 PM)heiner55 Wrote: uid[0] if uid != [] else "none"

Thanks for the answer

I have adjusted my code accordingly

import pandas as pd
import xml.etree.ElementTree as ET

file = ET.parse(r'test.xml')

# Create an emplty dataframe
Data_columns=['uid','firstName','lastName','sdnType']
table = pd.DataFrame(columns=Data_columns)
table = pd.DataFrame()

for node in file.getroot():
    uid= [uid.text for uid in node.findall('{http://tempuri.org/sdnList.xsd}uid')]
    firstname= [firstname.text for firstname in node.findall('{http://tempuri.org/sdnList.xsd}firstName')]
    lastName= [lastName.text for lastName in node.findall('{http://tempuri.org/sdnList.xsd}lastName')]
    sdnType= [sdnType.text for sdnType in node.findall('{http://tempuri.org/sdnList.xsd}sdnType')]
    table_List =[[uid[0] if uid != [] else '',firstname[0] if firstname != [] else '',lastName[0] if lastName != [] else '',sdnType[0] if sdnType != [] else '']]
    table1 = pd.DataFrame(table_List,columns=Data_columns)
    table = table.append(table1,ignore_index=True)

print(table)
Output:
Out[52]: uid firstName lastName sdnType 0 1 9639 Ismail Abdul Salah HANIYA Individual 2 26182 Evren KAYAKIRAN Individual
Reply
#8
Now it looks better.
Reply
#9
I tried to parse value in "programList/program"
<ns0:programList>
<ns0:program>FSE-IR</ns0:program>
</ns0:programList>
1. I managed to get value using follwing code. But is there a better way to get this?
1st try
for node in file.getroot():
    for programList in node.findall('{http://tempuri.org/sdnList.xsd}programList'):
        for program in programList.findall('{http://tempuri.org/sdnList.xsd}program'):
            print(program.text)
2nd try
def cleanaa(a):
    cleana = a[0] if a != [] else ''
    return cleana 

for node in file.getroot():
    programList1 = cleanaa([[program.text for program in programList.findall('{http://tempuri.org/sdnList.xsd}program')] for programList in node.findall('{http://tempuri.org/sdnList.xsd}programList')])
    print(programList1)
The second output seems more appropriate as it creates a list and gets multiple values if there are many(maximum there can be two values) for each iteration.
Eg:

Output:
['UKRAINE-EO13662'] ['SYRIA', 'UKRAINE-EO13662'] ['UKRAINE-EO13662']
2. Since there can be one or two values, can I get the two values into two variables, where if there is only one value the second variable will be an empty one? ('')

Appreciate if you can give some inputs to this.
Reply
#10
Maybe this helps:

#!/usr/bin/python3

def cleanaa(a):
    cleana = a[0] if a != [] else ''
    return cleana 

[x, *y] = ['UKRAINE-EO13662']
print(x, cleanaa(y))

[x, *y] = ['SYRIA', 'UKRAINE-EO13662']
print(x, cleanaa(y))
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  how to parse data fakka 2 313 Sep-22-2021, 10:50 PM
Last Post: bowlofred
  xml file creation from an XML file template and data from an excel file naji_python 1 761 Dec-21-2020, 03:24 PM
Last Post: Gribouillis
  saving data from text file to CSV file in python having delimiter as space K11 1 873 Sep-11-2020, 06:28 AM
Last Post: bowlofred
  Parse BytesIO data GrahamL 2 850 Aug-19-2020, 05:09 PM
Last Post: bowlofred
  How to save CSV file data into the Azure Data Lake Storage Gen2 table? Mangesh121 0 937 Jun-26-2020, 11:59 AM
Last Post: Mangesh121
  Process Data from one csv file and write to another CSV file specific column ajin9581 1 960 Jun-17-2020, 06:00 PM
Last Post: buran
  Parse encrypted xml file to csv Mekala 2 957 May-30-2020, 12:23 AM
Last Post: Mekala
  How can i parse a log file to JSON. menarcarlos 2 1,061 May-26-2020, 10:23 AM
Last Post: buran
  command line input (arg parse) and data exchange Simba 7 2,201 Dec-06-2019, 11:58 PM
Last Post: Simba
  Read csv file, parse data, and store in a dictionary markellefultz20 4 2,290 Nov-26-2019, 03:33 PM
Last Post: DeaD_EyE

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020