Python Forum

Full Version: Unexpected Output after Running PArsing Script
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Been trying to convert the following XML file to CSV format
[xml]
Output:
<ReturnHeader binaryAttachmentCount="0"> <Timestamp>2012-04-21T10:23:00-06:00</Timestamp> <TaxPeriodEndDate>2011-12-31</TaxPeriodEndDate> <ReturnType>990PF</ReturnType> <TaxPeriodBeginDate>2011-01-01</TaxPeriodBeginDate> <Filer> <EIN>586449065</EIN> <Name> <BusinessNameLine1>LAVINA MICHL WRIGHT SCHOLARSHIP</BusinessNameLine1> </Name> <NameControl>WRIG</NameControl> <Phone>3367478182</Phone> <USAddress> <AddressLine1>1525 W WT HARRIS BLVD D1114-044</AddressLine1> <City>CHARLOTTE</City> <State>NC</State> <ZIPCode>28288</ZIPCode> </USAddress> </Filer> <Officer> <Name>WELLS FARGO BANK NA</Name> <Title>Trustee</Title> <Phone>3367478182</Phone> <DateSigned>2012-04-13</DateSigned> </Officer> <TaxYear>2011</TaxYear> <BuildTS>2016-02-24 21:20:13Z</BuildTS> </ReturnHeader>
[/xml]
The Python code am running is
import xml.etree.ElementTree as ET
import csv

tree = ET.parse("/home/hotsea/XML/test_extract.xml")
root = tree.getroot()

f = open('/home/hotsea/XML/CSV/test_extract_result.csv', 'w')

csvwriter = csv.writer(f)

count = 0

head = ['TaxPeriodEndDate','ReturnType','TaxPeriodBeginDate','EIN','BusinessNameLine1','State','TaxYear']

csvwriter.writerow(head)

for returnheader in root.findall('ReturnHeader'):
    row = []
    taskperiodenddate = returnheader.find('TaxPeriodEndDate').text
    row.append(taskperiodenddate)
    returntype = returnheader.find('ReturnType').text
    row.append(returntype)
    taxperiodbegindate = returnheader.find('TaxPeriodBeginDate').find('Name').text
    row.append(taxperiodbegindate)
    ein = returnheader.find('EIN').text
    row.append(ein)
    businessnameline1 = returnheader.find('BusinessNameLine1').text
    row.append(businessnameline1)
    state = returnheader.find('State').text
    row.append(State)
    taxyear = returnheader.find('TaxYear').text
    row.append(taxyear)
    csvwriter.writerow(row)
f.close()
I want the output fields to be:
TaxPeriodEndDate','ReturnType','TaxPeriodBeginDate','EIN','BusinessNameLine1','State','TaxYear
2011-12-31, 990PF, 2011-01-01, 586449065,LAVINA MICHL WRIGHT SCHOLARSHIP,NC, 2011

Someone out there know how to tweak my code so it can give me that? Huh Huh Huh
What is the current output?
I only get the headers with no actual data.
It doesn't find the element because <ReturnHeader> is the root element and findall() only finds subelements. If you write <foo> and </foo> tags around the xml file, it finds the node. There are other errors, because TaxPeriodBeginDate doesn't have a Name child for example.
Thanks very much for that.
How would your script look like if you don't mind me asking? Bearing in mind I want the said headers to contain data.
Well you only need to change the code until python doesn't throw exceptions anymore, for example

taxperiodbegindate = returnheader.find('TaxPeriodBeginDate').find('Name').text
could be changed to

taxperiodbegindate = returnheader.find('TaxPeriodBeginDate').text
Use python error messages to find which part of the code need to be updated.
Much appreciated!
This is what I've finally come up with as a solution. And it works Dance
import xml.etree.ElementTree as ET
import csv

tree = ET.parse("/home/hotsea/XML/test_extract.xml")
root = tree.getroot()

f = open('/home/hotsea/XML/CSV/test_extract_result.csv', 'w')

csvwriter = csv.writer(f)

count = 0

head = ['EIN','BusinessNameLine1','NameControl','Phone','AddressLine1','City','State','ZIPCode']

csvwriter.writerow(head)

for filer in root.findall('Filer'):
    row = []
    ein = filer.find('EIN').text
    row.append(ein)
    businessNameLine1 = filer.find('Name').find('BusinessNameLine1').text
    row.append(businessNameLine1)
    namecontrol = filer.find('NameControl').text
    row.append(namecontrol)
    phone = filer.find('Phone').text
    row.append(phone)
    addressline1 = filer.find('USAddress').find('AddressLine1').text
    row.append(addressline1)
    city = filer.find('USAddress').find('City').text
    row.append(city)
    state = filer.find('USAddress').find('State').text
    row.append(state)
    zipcode = filer.find('USAddress').find('ZIPCode').text
    row.append(zipcode)
    csvwriter.writerow(row)
f.close()
The output is as follows:
Output:
EIN BusinessNameLine1 NameControl Phone AddressLine1 City State ZIPCode 586449065 LAVINA MICHL WRIGHT SCHOLARSHIP WRIG 3367478182 1525 W WT HARRIS BLVD D1114-044 CHARLOTTE NC 28288