Python Forum
Unexpected Output after Running PArsing Script
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Unexpected Output after Running PArsing Script
#1
Been trying to convert the following XML file to CSV format
[xml]
Output:
<ReturnHeader binaryAttachmentCount="0"> <Timestamp>2012-04-21T10:23:00-06:00</Timestamp> <TaxPeriodEndDate>2011-12-31</TaxPeriodEndDate> <ReturnType>990PF</ReturnType> <TaxPeriodBeginDate>2011-01-01</TaxPeriodBeginDate> <Filer> <EIN>586449065</EIN> <Name> <BusinessNameLine1>LAVINA MICHL WRIGHT SCHOLARSHIP</BusinessNameLine1> </Name> <NameControl>WRIG</NameControl> <Phone>3367478182</Phone> <USAddress> <AddressLine1>1525 W WT HARRIS BLVD D1114-044</AddressLine1> <City>CHARLOTTE</City> <State>NC</State> <ZIPCode>28288</ZIPCode> </USAddress> </Filer> <Officer> <Name>WELLS FARGO BANK NA</Name> <Title>Trustee</Title> <Phone>3367478182</Phone> <DateSigned>2012-04-13</DateSigned> </Officer> <TaxYear>2011</TaxYear> <BuildTS>2016-02-24 21:20:13Z</BuildTS> </ReturnHeader>
[/xml]
The Python code am running is
import xml.etree.ElementTree as ET
import csv

tree = ET.parse("/home/hotsea/XML/test_extract.xml")
root = tree.getroot()

f = open('/home/hotsea/XML/CSV/test_extract_result.csv', 'w')

csvwriter = csv.writer(f)

count = 0

head = ['TaxPeriodEndDate','ReturnType','TaxPeriodBeginDate','EIN','BusinessNameLine1','State','TaxYear']

csvwriter.writerow(head)

for returnheader in root.findall('ReturnHeader'):
    row = []
    taskperiodenddate = returnheader.find('TaxPeriodEndDate').text
    row.append(taskperiodenddate)
    returntype = returnheader.find('ReturnType').text
    row.append(returntype)
    taxperiodbegindate = returnheader.find('TaxPeriodBeginDate').find('Name').text
    row.append(taxperiodbegindate)
    ein = returnheader.find('EIN').text
    row.append(ein)
    businessnameline1 = returnheader.find('BusinessNameLine1').text
    row.append(businessnameline1)
    state = returnheader.find('State').text
    row.append(State)
    taxyear = returnheader.find('TaxYear').text
    row.append(taxyear)
    csvwriter.writerow(row)
f.close()
I want the output fields to be:
TaxPeriodEndDate','ReturnType','TaxPeriodBeginDate','EIN','BusinessNameLine1','State','TaxYear
2011-12-31, 990PF, 2011-01-01, 586449065,LAVINA MICHL WRIGHT SCHOLARSHIP,NC, 2011

Someone out there know how to tweak my code so it can give me that? Huh Huh Huh
Reply
#2
What is the current output?
Reply
#3
I only get the headers with no actual data.
Reply
#4
It doesn't find the element because <ReturnHeader> is the root element and findall() only finds subelements. If you write <foo> and </foo> tags around the xml file, it finds the node. There are other errors, because TaxPeriodBeginDate doesn't have a Name child for example.
Reply
#5
Thanks very much for that.
How would your script look like if you don't mind me asking? Bearing in mind I want the said headers to contain data.
Reply
#6
Well you only need to change the code until python doesn't throw exceptions anymore, for example

taxperiodbegindate = returnheader.find('TaxPeriodBeginDate').find('Name').text
could be changed to

taxperiodbegindate = returnheader.find('TaxPeriodBeginDate').text
Use python error messages to find which part of the code need to be updated.
Reply
#7
Much appreciated!
Reply
#8
This is what I've finally come up with as a solution. And it works Dance
import xml.etree.ElementTree as ET
import csv

tree = ET.parse("/home/hotsea/XML/test_extract.xml")
root = tree.getroot()

f = open('/home/hotsea/XML/CSV/test_extract_result.csv', 'w')

csvwriter = csv.writer(f)

count = 0

head = ['EIN','BusinessNameLine1','NameControl','Phone','AddressLine1','City','State','ZIPCode']

csvwriter.writerow(head)

for filer in root.findall('Filer'):
    row = []
    ein = filer.find('EIN').text
    row.append(ein)
    businessNameLine1 = filer.find('Name').find('BusinessNameLine1').text
    row.append(businessNameLine1)
    namecontrol = filer.find('NameControl').text
    row.append(namecontrol)
    phone = filer.find('Phone').text
    row.append(phone)
    addressline1 = filer.find('USAddress').find('AddressLine1').text
    row.append(addressline1)
    city = filer.find('USAddress').find('City').text
    row.append(city)
    state = filer.find('USAddress').find('State').text
    row.append(state)
    zipcode = filer.find('USAddress').find('ZIPCode').text
    row.append(zipcode)
    csvwriter.writerow(row)
f.close()
The output is as follows:
Output:
EIN BusinessNameLine1 NameControl Phone AddressLine1 City State ZIPCode 586449065 LAVINA MICHL WRIGHT SCHOLARSHIP WRIG 3367478182 1525 W WT HARRIS BLVD D1114-044 CHARLOTTE NC 28288
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  No Internet connection when running a Python script basil_555 8 444 Mar-11-2024, 11:02 AM
Last Post: snippsat
Question Running Python script through Task Scheduler? Winfried 8 340 Mar-10-2024, 07:24 PM
Last Post: Winfried
  unexpected EOF while parsing dawid294 1 379 Jan-03-2024, 04:22 PM
Last Post: deanhystad
  Unexpected output Starter 2 439 Nov-22-2023, 12:08 AM
Last Post: Starter
  Help Running Python Script in Mac OS emojistickers 0 306 Nov-20-2023, 01:58 PM
Last Post: emojistickers
  Unexpected Output - Python Dataframes: Filtering based on Overlapping Dates Xensor 5 657 Nov-15-2023, 06:54 PM
Last Post: deanhystad
  Trying to make a board with turtle, nothing happens when running script Quascia 3 608 Nov-01-2023, 03:11 PM
Last Post: deanhystad
  Unexpected output while using random.randint with def terickson2367 1 469 Oct-24-2023, 05:56 AM
Last Post: buran
  Python script running under windows over nssm.exe JaroslavZ 0 675 May-12-2023, 09:22 AM
Last Post: JaroslavZ
  Unexpected output from df.loc when indexing by label idratherbecoding 6 1,127 Apr-19-2023, 12:11 AM
Last Post: deanhystad

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020