Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
ElementTree
#1
hello, I have the below xml file and want to create a csv file. I'm having issue to repeat the category. I would like to parse the data using ElementTree. any help would be appreciated.

category,decade_years,title,format,year,rating,description
Action,1980,Indiana Jones: The raiders of the lost Ark,DVD,1981,PG,Archaeologist and adventurer Indiana Jones is hired by the U.S. government to find the Ark of the Covenant before the Nazis.
Action,1980,THE KARATE KID,DVD & Online,1984,PG,None provided.
Action,1980,Back 2 the Future,Blu-ray,1985,PG,Marty McFly
Action,1990,X-Men,dvd & digital,2000,PG-13,Two mutants come to a private academy for their kind whose resident superhero team must oppose a terrorist organization with similar powers.
Action,1990,Batman Returns,VHS,1992,PG13,NA.
Action,1990,Reservoir Dogs,Online,1992,R,WhAtEvER I Want!!!?!
Thriller,1970s,ALIEN,DVD,1979,R,"""""""""
Thriller,1980S,Ferris Bueller's Day Off,DVD,1986,PG13,Funny movie about a funny guy
Thriller,1980S,American Psycho,blue-ray,2000,R,psychopathic Bateman


<?xml version="1.0"?>
<collection>
<genre category="Action">
<decade years="1980s">
<movie favorite="True" title="Indiana Jones: The raiders of the lost Ark">
<format multiple="No">DVD</format>
<year>1981</year>
<rating>PG</rating>
<description>
'Archaeologist and adventurer Indiana Jones
is hired by the U.S. government to find the Ark of the
Covenant before the Nazis.'
</description>
</movie>
<movie favorite="True" title="THE KARATE KID">
<format multiple="Yes">DVD,Online</format>
<year>1984</year>
<rating>PG</rating>
<description>None provided.</description>
</movie>
<movie favorite="False" title="Back 2 the Future">
<format multiple="False">Blu-ray</format>
<year>1985</year>
<rating>PG</rating>
<description>Marty McFly</description>
</movie>
</decade>
<decade years="1990s">
<movie favorite="False" title="X-Men">
<format multiple="Yes">dvd, digital</format>
<year>2000</year>
<rating>PG-13</rating>
<description>Two mutants come to a private academy for their kind whose resident superhero team must
oppose a terrorist organization with similar powers.</description>
</movie>
<movie favorite="True" title="Batman Returns">
<format multiple="No">VHS</format>
<year>1992</year>
<rating>PG13</rating>
<description>NA.</description>
</movie>
<movie favorite="False" title="Reservoir Dogs">
<format multiple="No">Online</format>
<year>1992</year>
<rating>R</rating>
<description>WhAtEvER I Want!!!?!</description>
</movie>
</decade>
</genre>

<genre category="Thriller">
<decade years="1970s">
<movie favorite="False" title="ALIEN">
<format multiple="Yes">DVD</format>
<year>1979</year>
<rating>R</rating>
<description>"""""""""</description>
</movie>
</decade>
<decade years="1980s">
<movie favorite="True" title="Ferris Bueller's Day Off">
<format multiple="No">DVD</format>
<year>1986</year>
<rating>PG13</rating>
<description>Funny movie about a funny guy</description>
</movie>
<movie favorite="FALSE" title="American Psycho">
<format multiple="No">blue-ray</format>
<year>2000</year>
<rating>Unrated</rating>
<description>psychopathic Bateman</description>
</movie>
</decade>
</genre>
</collection>
Reply
#2
import xml.etree.ElementTree as ET
import csv


def xml_parser(file):
    xml = ET.parse(file)
    genres = xml.findall('genre')

    for genre in genres:
        #print(genre.attrib)
        for decade in genre:
            #print(decade.attrib)
            for movie in decade:
                #print(movie.attrib)
                fmt = movie.find('format')
                year = movie.find('year')
                rate = movie.find('rating')
                #print(fmt.text, year.text, rate.text)
                #print(fmt.attrib, year.attrib, rate.attrib)
                yield genre.get('category'), decade.get('years'), movie.get('title'), fmt.text, year.text, rate.text
            return xml
It's a generator. You have to consume it with your writer. For example you can use the method writerows() on the csv_writer instance. I put some print statements in the nested for loops. If you put the code without the function on module level, you can try in the repel. Methods of ElementTree you need to know: attrib, name, text, get(), findall()
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#3
thank you, it worked.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  XML Parsing - Find a specific text (ElementTree) TeraX 3 4,023 Oct-09-2018, 09:06 AM
Last Post: TeraX
  using eclispe and e = xml.etree.ElementTree.parse nickhere 9 4,628 Jan-11-2018, 08:37 PM
Last Post: nickhere

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020