how to read data from xml file

Raj · (This post was last modified: Apr-13-2018, 02:54 PM by snippsat.)

I have excell file, I use below code:

# -*- coding: utf-8 -*-
"""
Created on Fri Apr 13 20:33:17 2018

@author: user
"""

#from xml.dom import minidom
#
#doc = minidom.parse("D:\Mekala_Backupdata\PythonCodes\input.xml")

import xml.etree.cElementTree as ET
tree = ET.ElementTree('D:\Mekala_Backupdata\PythonCodes\input.xml')
root = tree.getroot()
for books in root:
    if (books.tag=='book'):
        print books.get('id')
        for attr in books:
            if (attr.tag==author'):
                print (attr.text)

But it does not work can some help me,
My xml file is

Output:<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>44.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications 
      with XML.</description>
   </book>
   <book id="bk102">
      <author>Ralls, Kim</author>
      <title>Midnight Rain</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2000-12-16</publish_date>
      <description>A former architect battles corporate zombies, 
      an evil sorceress, and her own childhood to become queen 
      of the world.</description>
   </book>
   <book id="bk103">
      <author>Corets, Eva</author>
      <title>Maeve Ascendant</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2000-11-17</publish_date>
      <description>After the collapse of a nanotechnology 
      society in England, the young survivors lay the 
      foundation for a new society.</description>
   </book>
   <book id="bk104">
      <author>Corets, Eva</author>
      <title>Oberon's Legacy</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2001-03-10</publish_date>
      <description>In post-apocalypse England, the mysterious 
      agent known only as Oberon helps to create a new life 
      for the inhabitants of London. Sequel to Maeve 
      Ascendant.</description>
   </book>
   <book id="bk105">
      <author>Corets, Eva</author>
      <title>The Sundered Grail</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2001-09-10</publish_date>
      <description>The two daughters of Maeve, half-sisters, 
      battle one another for control of England. Sequel to 
      Oberon's Legacy.</description>
   </book>
   <book id="bk106">
      <author>Randall, Cynthia</author>
      <title>Lover Birds</title>
      <genre>Romance</genre>
      <price>4.95</price>
      <publish_date>2000-09-02</publish_date>
      <description>When Carla meets Paul at an ornithology 
      conference, tempers fly as feathers get ruffled.</description>
   </book>
   <book id="bk107">
      <author>Thurman, Paula</author>
      <title>Splish Splash</title>
      <genre>Romance</genre>
      <price>4.95</price>
      <publish_date>2000-11-02</publish_date>
      <description>A deep sea diver finds true love twenty 
      thousand leagues beneath the sea.</description>
   </book>
   <book id="bk108">
      <author>Knorr, Stefan</author>
      <title>Creepy Crawlies</title>
      <genre>Horror</genre>
      <price>4.95</price>
      <publish_date>2000-12-06</publish_date>
      <description>An anthology of horror stories about roaches,
      centipedes, scorpions  and other insects.</description>
   </book>
   <book id="bk109">
      <author>Kress, Peter</author>
      <title>Paradox Lost</title>
      <genre>Science Fiction</genre>
      <price>6.95</price>
      <publish_date>2000-11-02</publish_date>
      <description>After an inadvertant trip through a Heisenberg
      Uncertainty Device, James Salway discovers the problems 
      of being quantum.</description>
   </book>
   <book id="bk110">
      <author>O'Brien, Tim</author>
      <title>Microsoft .NET: The Programming Bible</title>
      <genre>Computer</genre>
      <price>36.95</price>
      <publish_date>2000-12-09</publish_date>
      <description>Microsoft's .NET initiative is explored in 
      detail in this deep programmer's reference.</description>
   </book>
   <book id="bk111">
      <author>O'Brien, Tim</author>
      <title>MSXML3: A Comprehensive Guide</title>
      <genre>Computer</genre>
      <price>36.95</price>
      <publish_date>2000-12-01</publish_date>
      <description>The Microsoft MSXML3 parser is covered in 
      detail, with attention to XML DOM interfaces, XSLT processing, 
      SAX and more.</description>
   </book>
   <book id="bk112">
      <author>Galos, Mike</author>
      <title>Visual Studio 7: A Comprehensive Guide</title>
      <genre>Computer</genre>
      <price>49.95</price>
      <publish_date>2001-04-16</publish_date>
      <description>Microsoft Visual Studio 7 is explored in depth,
      looking at how Visual Basic, Visual C++, C#, and ASP+ are 
      integrated into a comprehensive development 
      environment.</description>
   </book>
</catalog>

dwill · Apr-13-2018, 04:09 PM

Hi..

You have:

 if (attr.tag==author'):

...but looks like you left off a single quote before author? should be:

 if (attr.tag=='author'):

**Larz60+** · Apr-13-2018, 08:18 PM

take a look at w3schoold XML tutorial here:
The 'Books.xml' file looks almost identical: https://www.w3schools.com/xml/xml_usedfor.asp

Raj · (This post was last modified: Apr-14-2018, 05:26 AM by Raj.)

It still give error,
Below is my code:

[python][python]import xml.etree.cElementTree as ET
tree = ET.ElementTree('input.xml')
root = tree.getroot()
for books in root:
    if (books.tag=='book'):
        print books.get('id') % here it says invalid syntax
        for attr in books:
            if (attr.tag=='author'):
                print (attr.text)

[/python][/python]

dwill · Apr-14-2018, 06:39 AM

Hi

I changed your code to the below. I used the "parse" method/function to parse the xml file into the "tree" object. Also, I put the source file in my temp folder..but you can change that back to the location you were running from.

import xml.etree.cElementTree as ET
source_file = 'C:\Temp\input.xml'
tree = ET.parse(source_file)
root = tree.getroot()
for books in root:
    if (books.tag=='book'):
        print(books.get('id'))
        for attr in books:
            if (attr.tag=='author'):
                print (attr.text)

The output I get is:

Output:bk101
Gambardella, Matthew
bk102
Ralls, Kim
bk103
Corets, Eva
bk104
Corets, Eva
bk105
Corets, Eva
bk106
Randall, Cynthia
bk107
Thurman, Paula
bk108
Knorr, Stefan
bk109
Kress, Peter
bk110
O'Brien, Tim
bk111
O'Brien, Tim
bk112
Galos, Mike

Raj · (This post was last modified: Apr-14-2018, 10:45 AM by Larz60+.)

# -*- coding: utf-8 -*-
"""
Created on Fri Apr 13 20:33:17 2018

@author: user
"""

#from xml.dom import minidom
#
#doc = minidom.parse("D:\Mekala_Backupdata\PythonCodes\input.xml")

 
#import xml.etree.cElementTree as ET
#tree = ET.ElementTree('D:\Mekala_Backupdata\PythonCodes\input.xml')
#root = tree.getroot()
#for books in root:
#    if (books.tag=='book'):
#        print books.get('id')
#        for attr in books:
#            if (attr.tag=='author'):
#                print (attr.text)
                
import xml.etree.cElementTree as ET
source_file = 'D:\Mekala_Backupdata\PythonCodes\input.xml'
tree = ET.parse(source_file)
root = tree.getroot()
for books in root:
    if (books.tag=='book'):
        print(books.get('id'))
        for attr in books:
            if (attr.tag=='author' or attr.tag=='title' or attr.tag=='price'):
                print (attr.text)

the code is workig,
But in tilte part, I only want first part before ":"

for example: in my above xml file the last title is:
Visual Studio 7: A Comprehensive Guide
I only want upto "Visual Studio 7" (or if I want to take the part after ":" i.e A Comprehensive Guide)

kinldy help how to do this

***snippsat*** · (This post was last modified: Apr-14-2018, 11:49 AM by snippsat.)

(Apr-14-2018, 10:31 AM)Raj Wrote: for example: in my above xml file the last title is:
Visual Studio 7: A Comprehensive Guide
I only want upto "Visual Studio 7" (or if I want to take the part after ":" i.e A Comprehensive Guide)

Split it up,can also make a dicionarey of it.
Here also a alternative way with BeautifulSoup or lxml these are just better and more updated parser than parses in standard library.
Also easier to use.

from bs4 import BeautifulSoup

soup = BeautifulSoup(open(r'C:\1_py\my.xml'), 'xml')
book = soup.find('book', id="bk112")
title = book.title.text
print(title)

# Make dictionray
lst = title.split(':')
d = dict([lst])

Test:

>>> Visual Studio 7: A Comprehensive Guide

>>> d
{'Visual Studio 7': ' A Comprehensive Guide'}
>>> d['Visual Studio 7']
' A Comprehensive Guide'

>>> d.keys()
dict_keys(['Visual Studio 7'])
>>> d.values()
dict_values([' A Comprehensive Guide'])

Raj · (This post was last modified: Apr-14-2018, 12:15 PM by Raj.)

which command I should write after
if (attr.tag=='author' or attr.tag=='title' or attr.tag=='price'):
print (attr.text)

Not just one tiltle, but in all titles, I wish only to take the first part of it (before :)

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Open and read a tab delimited file from html using python cgi	luffy	2	3,777	Aug-24-2020, 06:25 AM Last Post: luffy
	Read owl file using python flask	Gayathri	1	3,157	Nov-20-2019, 12:56 PM Last Post: ChislaineWijdeven
	Read XML-File	yuyu	16	10,053	Dec-15-2018, 10:49 PM Last Post: snippsat
	Read input file and print hyperlinks	Emmanouil	8	18,230	Oct-23-2016, 07:26 PM Last Post: snippsat

how to read data from xml file

User Panel Messages

Announcements