Parse data from xml file - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: Parse data from xml file (/thread-18912.html) |
Parse data from xml file - klllmmm - Jun-06-2019 I'm trying to parse data from a xml file downloaded from https://www.treasury.gov/ofac/downloads/consolidated/consolidated.xml Sample of the xml file is attached. I tried to parse the data, but it was not successful. Output i'm getting for "firstname" is an empty list Appreciate if some can help on this. import xml.etree.ElementTree as ET file = ET.parse(r'D:\path\to\file\test.xml') for node in file.getroot(): print(node) firstname = node.findall('firstName') print(firstname)
RE: Parse data from xml file - heiner55 - Jun-06-2019 #!/usr/bin/python3 import xml.etree.ElementTree as ET . file = ET.parse(r'test.xml') . for node in file.getroot(): print(node) firstname = node.findall('{http://tempuri.org/sdnList.xsd}firstName') print(firstname)
RE: Parse data from xml file - klllmmm - Jun-07-2019 (Jun-06-2019, 04:41 AM)heiner55 Wrote:#!/usr/bin/python3 import xml.etree.ElementTree as ET . file = ET.parse(r'test.xml') . for node in file.getroot(): print(node) firstname = node.findall('{http://tempuri.org/sdnList.xsd}firstName') print(firstname) Thanks for the answer. Using the way suggested I manage to parse some data. import pandas as pd import xml.etree.ElementTree as ET file = ET.parse(r'test.xml') # Create an emplty dataframe Data_columns=['uid','firstName','lastName','sdnType'] table = pd.DataFrame(columns=Data_columns) table = pd.DataFrame() for node in file.getroot(): uid= [uid.text for uid in node.findall('{http://tempuri.org/sdnList.xsd}uid')] firstname= [firstname.text for firstname in node.findall('{http://tempuri.org/sdnList.xsd}firstName')] lastName= [lastName.text for lastName in node.findall('{http://tempuri.org/sdnList.xsd}lastName')] sdnType= [sdnType.text for sdnType in node.findall('{http://tempuri.org/sdnList.xsd}sdnType')] table_List =[[uid,firstname,lastName,sdnType]] table1 = pd.DataFrame(table_List,columns=Data_columns) table = table.append(table1,ignore_index=True) print(table) How can i get the values with out brackets?Appreciate if someone can help on this RE: Parse data from xml file - heiner55 - Jun-07-2019 Because it is an array: uid == array uid[0] ==> first element of array uid[1] ==> second element RE: Parse data from xml file - klllmmm - Jun-07-2019 (Jun-07-2019, 04:22 PM)heiner55 Wrote: Because it is an array: Thanks for the answer. I'm not sure how get an element like uid[0] as sometimes it is an empty array like []. Appreciate someone can indicate how to make the value in array into a string. RE: Parse data from xml file - heiner55 - Jun-07-2019 if uid == []: name = "none" else: name = uid[0]or name = uid[0] if uid != [] else "none" RE: Parse data from xml file - klllmmm - Jun-08-2019 (Jun-07-2019, 05:17 PM)heiner55 Wrote: uid[0] if uid != [] else "none" Thanks for the answer I have adjusted my code accordingly import pandas as pd import xml.etree.ElementTree as ET file = ET.parse(r'test.xml') # Create an emplty dataframe Data_columns=['uid','firstName','lastName','sdnType'] table = pd.DataFrame(columns=Data_columns) table = pd.DataFrame() for node in file.getroot(): uid= [uid.text for uid in node.findall('{http://tempuri.org/sdnList.xsd}uid')] firstname= [firstname.text for firstname in node.findall('{http://tempuri.org/sdnList.xsd}firstName')] lastName= [lastName.text for lastName in node.findall('{http://tempuri.org/sdnList.xsd}lastName')] sdnType= [sdnType.text for sdnType in node.findall('{http://tempuri.org/sdnList.xsd}sdnType')] table_List =[[uid[0] if uid != [] else '',firstname[0] if firstname != [] else '',lastName[0] if lastName != [] else '',sdnType[0] if sdnType != [] else '']] table1 = pd.DataFrame(table_List,columns=Data_columns) table = table.append(table1,ignore_index=True) print(table)
RE: Parse data from xml file - heiner55 - Jun-08-2019 Now it looks better. RE: Parse data from xml file - klllmmm - Jun-19-2019 I tried to parse value in "programList/program" <ns0:programList> <ns0:program>FSE-IR</ns0:program> </ns0:programList> 1. I managed to get value using follwing code. But is there a better way to get this? 1st try for node in file.getroot(): for programList in node.findall('{http://tempuri.org/sdnList.xsd}programList'): for program in programList.findall('{http://tempuri.org/sdnList.xsd}program'): print(program.text)2nd try def cleanaa(a): cleana = a[0] if a != [] else '' return cleana for node in file.getroot(): programList1 = cleanaa([[program.text for program in programList.findall('{http://tempuri.org/sdnList.xsd}program')] for programList in node.findall('{http://tempuri.org/sdnList.xsd}programList')]) print(programList1)The second output seems more appropriate as it creates a list and gets multiple values if there are many(maximum there can be two values) for each iteration. Eg: 2. Since there can be one or two values, can I get the two values into two variables, where if there is only one value the second variable will be an empty one? ('')Appreciate if you can give some inputs to this. RE: Parse data from xml file - heiner55 - Jun-25-2019 Maybe this helps: #!/usr/bin/python3 def cleanaa(a): cleana = a[0] if a != [] else '' return cleana [x, *y] = ['UKRAINE-EO13662'] print(x, cleanaa(y)) [x, *y] = ['SYRIA', 'UKRAINE-EO13662'] print(x, cleanaa(y)) |