Python Forum

Full Version: Parse the data in XML metadata field
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I'm trying to parse data from an XML file downloaded from https://scsanctions.un.org/resources/xml...idated.xml

Sample of the XML file is attached.

<?xml version="1.0" encoding="UTF-8"?>
<CONSOLIDATED_LIST xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="https://www.un.org/sc/resources/sc-sanctions.xsd" dateGenerated="2019-06-17T19:04:28.013-04:00">

I tried to parse the value for "dateGenerated" metadata, but it was not successful.

Appreciate if some can help on this.


import pandas as pd
import xml.etree.ElementTree as ET

file1 = ET.parse(r'scsanctions.un.org_copy.xml')

for node in file1.getroot():
    print(ET.tostring(node, encoding='utf8').decode('utf8'))
    print(node)
    for i in node:
        dataid= [dataid.text for dataid in i.findall('DATAID')]
        print(dataid)
    # Try 1
    d = node.findall('dateGenerated')
    print(d.text)

d1= file1.findall('dateGenerated')
print(d1)
>>> doc = ET.parse('scsanctions.un.org_copy.xml')
>>> root = doc.getroot()
>>> root
<Element 'CONSOLIDATED_LIST' at 0x0000026D52CB0BD8>
>>> root.attrib['dateGenerated']
'2019-06-17T19:04:28.013-04:00'
(Jun-19-2019, 10:12 AM)stranac Wrote: [ -> ]
>>> doc = ET.parse('scsanctions.un.org_copy.xml')
>>> root = doc.getroot()
>>> root
<Element 'CONSOLIDATED_LIST' at 0x0000026D52CB0BD8>
>>> root.attrib['dateGenerated']
'2019-06-17T19:04:28.013-04:00'

Thank you! Smile