Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Parse data from xml file
#1
I'm trying to parse data from a xml file downloaded from https://www.treasury.gov/ofac/downloads/...idated.xml

Sample of the xml file is attached.

I tried to parse the data, but it was not successful. Output i'm getting for "firstname" is an empty list

Appreciate if some can help on this.


import xml.etree.ElementTree as ET

file = ET.parse(r'D:\path\to\file\test.xml')

for node in file.getroot():
    print(node)
    firstname = node.findall('firstName')
    print(firstname)
Output:
<Element '{http://tempuri.org/sdnList.xsd}publshInformation' at 0x000001A18F7F8098> [] <Element '{http://tempuri.org/sdnList.xsd}sdnEntry' at 0x000001A18F8091D8> [] <Element '{http://tempuri.org/sdnList.xsd}sdnEntry' at 0x000001A18F809E08> []

Attached Files

.xml   test.xml (Size: 2.37 KB / Downloads: 565)
Reply
#2
#!/usr/bin/python3
import xml.etree.ElementTree as ET
.
file = ET.parse(r'test.xml')
.
for node in file.getroot():
    print(node)
    firstname = node.findall('{http://tempuri.org/sdnList.xsd}firstName')
    print(firstname)
Output:
<Element '{http://tempuri.org/sdnList.xsd}publshInformation' at 0xb74ffd24> [] <Element '{http://tempuri.org/sdnList.xsd}sdnEntry' at 0xb74ffdc4> [<Element '{http://tempuri.org/sdnList.xsd}firstName' at 0xb74ffe14>] <Element '{http://tempuri.org/sdnList.xsd}sdnEntry' at 0xb7502554> [<Element '{http://tempuri.org/sdnList.xsd}firstName' at 0xb75025a4>] ... ... ...
Reply
#3
(Jun-06-2019, 04:41 AM)heiner55 Wrote:
#!/usr/bin/python3
import xml.etree.ElementTree as ET
.
file = ET.parse(r'test.xml')
.
for node in file.getroot():
    print(node)
    firstname = node.findall('{http://tempuri.org/sdnList.xsd}firstName')
    print(firstname)
Output:
<Element '{http://tempuri.org/sdnList.xsd}publshInformation' at 0xb74ffd24> [] <Element '{http://tempuri.org/sdnList.xsd}sdnEntry' at 0xb74ffdc4> [<Element '{http://tempuri.org/sdnList.xsd}firstName' at 0xb74ffe14>] <Element '{http://tempuri.org/sdnList.xsd}sdnEntry' at 0xb7502554> [<Element '{http://tempuri.org/sdnList.xsd}firstName' at 0xb75025a4>] ... ... ...


Thanks for the answer.

Using the way suggested I manage to parse some data.

import pandas as pd
import xml.etree.ElementTree as ET

file = ET.parse(r'test.xml')

# Create an emplty dataframe
Data_columns=['uid','firstName','lastName','sdnType']
table = pd.DataFrame(columns=Data_columns)
table = pd.DataFrame()

for node in file.getroot():
    uid= [uid.text for uid in node.findall('{http://tempuri.org/sdnList.xsd}uid')]
    firstname= [firstname.text for firstname in node.findall('{http://tempuri.org/sdnList.xsd}firstName')]
    lastName= [lastName.text for lastName in node.findall('{http://tempuri.org/sdnList.xsd}lastName')]
    sdnType= [sdnType.text for sdnType in node.findall('{http://tempuri.org/sdnList.xsd}sdnType')]
    table_List =[[uid,firstname,lastName,sdnType]]
    table1 = pd.DataFrame(table_List,columns=Data_columns)
    table = table.append(table1,ignore_index=True)

print(table)
Output:
Out[37]: uid firstName lastName sdnType 0 [] [] [] [] 1 [9639] [Ismail Abdul Salah] [HANIYA] [Individual] 2 [26182] [Evren] [KAYAKIRAN] [Individual]
How can i get the values with out brackets?

Appreciate if someone can help on this
Reply
#4
Because it is an array:

uid == array
uid[0] ==> first element of array
uid[1] ==> second element
Reply
#5
(Jun-07-2019, 04:22 PM)heiner55 Wrote: Because it is an array:

uid == array
uid[0] ==> first element of array
uid[1] ==> second element

Thanks for the answer.

I'm not sure how get an element like uid[0] as sometimes it is an empty array like [].

Appreciate someone can indicate how to make the value in array into a string.
Reply
#6
if uid == []:
    name = "none"
else:
    name = uid[0]
or

name = uid[0] if uid != [] else "none"
Reply
#7
(Jun-07-2019, 05:17 PM)heiner55 Wrote: uid[0] if uid != [] else "none"

Thanks for the answer

I have adjusted my code accordingly

import pandas as pd
import xml.etree.ElementTree as ET

file = ET.parse(r'test.xml')

# Create an emplty dataframe
Data_columns=['uid','firstName','lastName','sdnType']
table = pd.DataFrame(columns=Data_columns)
table = pd.DataFrame()

for node in file.getroot():
    uid= [uid.text for uid in node.findall('{http://tempuri.org/sdnList.xsd}uid')]
    firstname= [firstname.text for firstname in node.findall('{http://tempuri.org/sdnList.xsd}firstName')]
    lastName= [lastName.text for lastName in node.findall('{http://tempuri.org/sdnList.xsd}lastName')]
    sdnType= [sdnType.text for sdnType in node.findall('{http://tempuri.org/sdnList.xsd}sdnType')]
    table_List =[[uid[0] if uid != [] else '',firstname[0] if firstname != [] else '',lastName[0] if lastName != [] else '',sdnType[0] if sdnType != [] else '']]
    table1 = pd.DataFrame(table_List,columns=Data_columns)
    table = table.append(table1,ignore_index=True)

print(table)
Output:
Out[52]: uid firstName lastName sdnType 0 1 9639 Ismail Abdul Salah HANIYA Individual 2 26182 Evren KAYAKIRAN Individual
Reply
#8
Now it looks better.
Reply
#9
I tried to parse value in "programList/program"
<ns0:programList>
<ns0:program>FSE-IR</ns0:program>
</ns0:programList>
1. I managed to get value using follwing code. But is there a better way to get this?
1st try
for node in file.getroot():
    for programList in node.findall('{http://tempuri.org/sdnList.xsd}programList'):
        for program in programList.findall('{http://tempuri.org/sdnList.xsd}program'):
            print(program.text)
2nd try
def cleanaa(a):
    cleana = a[0] if a != [] else ''
    return cleana 

for node in file.getroot():
    programList1 = cleanaa([[program.text for program in programList.findall('{http://tempuri.org/sdnList.xsd}program')] for programList in node.findall('{http://tempuri.org/sdnList.xsd}programList')])
    print(programList1)
The second output seems more appropriate as it creates a list and gets multiple values if there are many(maximum there can be two values) for each iteration.
Eg:

Output:
['UKRAINE-EO13662'] ['SYRIA', 'UKRAINE-EO13662'] ['UKRAINE-EO13662']
2. Since there can be one or two values, can I get the two values into two variables, where if there is only one value the second variable will be an empty one? ('')

Appreciate if you can give some inputs to this.
Reply
#10
Maybe this helps:

#!/usr/bin/python3

def cleanaa(a):
    cleana = a[0] if a != [] else ''
    return cleana 

[x, *y] = ['UKRAINE-EO13662']
print(x, cleanaa(y))

[x, *y] = ['SYRIA', 'UKRAINE-EO13662']
print(x, cleanaa(y))
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  parse json field from csv file lebossejames 4 725 Nov-14-2023, 11:34 PM
Last Post: snippsat
  parse/read from file seperated by dots giovanne 5 1,105 Jun-26-2023, 12:26 PM
Last Post: DeaD_EyE
  Trying to parse only 3 key values from json file cubangt 8 3,447 Jul-16-2022, 02:05 PM
Last Post: deanhystad
  how to parse data fakka 2 1,488 Sep-22-2021, 10:50 PM
Last Post: bowlofred
  xml file creation from an XML file template and data from an excel file naji_python 1 2,097 Dec-21-2020, 03:24 PM
Last Post: Gribouillis
  Parse BytesIO data GrahamL 2 2,164 Aug-19-2020, 05:09 PM
Last Post: bowlofred
  How to save CSV file data into the Azure Data Lake Storage Gen2 table? Mangesh121 0 2,103 Jun-26-2020, 11:59 AM
Last Post: Mangesh121
  Parse encrypted xml file to csv Mekala 2 2,211 May-30-2020, 12:23 AM
Last Post: Mekala
  How can i parse a log file to JSON. menarcarlos 2 2,426 May-26-2020, 10:23 AM
Last Post: buran
  command line input (arg parse) and data exchange Simba 7 4,322 Dec-06-2019, 11:58 PM
Last Post: Simba

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020