Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
how to read data from xml file
#1
I have excell file, I use below code:
# -*- coding: utf-8 -*-
"""
Created on Fri Apr 13 20:33:17 2018

@author: user
"""

#from xml.dom import minidom
#
#doc = minidom.parse("D:\Mekala_Backupdata\PythonCodes\input.xml")

import xml.etree.cElementTree as ET
tree = ET.ElementTree('D:\Mekala_Backupdata\PythonCodes\input.xml')
root = tree.getroot()
for books in root:
    if (books.tag=='book'):
        print books.get('id')
        for attr in books:
            if (attr.tag==author'):
                print (attr.text)
But it does not work can some help me,
My xml file is
Output:
<?xml version="1.0"?> <catalog> <book id="bk101"> <author>Gambardella, Matthew</author> <title>XML Developer's Guide</title> <genre>Computer</genre> <price>44.95</price> <publish_date>2000-10-01</publish_date> <description>An in-depth look at creating applications with XML.</description> </book> <book id="bk102"> <author>Ralls, Kim</author> <title>Midnight Rain</title> <genre>Fantasy</genre> <price>5.95</price> <publish_date>2000-12-16</publish_date> <description>A former architect battles corporate zombies, an evil sorceress, and her own childhood to become queen of the world.</description> </book> <book id="bk103"> <author>Corets, Eva</author> <title>Maeve Ascendant</title> <genre>Fantasy</genre> <price>5.95</price> <publish_date>2000-11-17</publish_date> <description>After the collapse of a nanotechnology society in England, the young survivors lay the foundation for a new society.</description> </book> <book id="bk104"> <author>Corets, Eva</author> <title>Oberon's Legacy</title> <genre>Fantasy</genre> <price>5.95</price> <publish_date>2001-03-10</publish_date> <description>In post-apocalypse England, the mysterious agent known only as Oberon helps to create a new life for the inhabitants of London. Sequel to Maeve Ascendant.</description> </book> <book id="bk105"> <author>Corets, Eva</author> <title>The Sundered Grail</title> <genre>Fantasy</genre> <price>5.95</price> <publish_date>2001-09-10</publish_date> <description>The two daughters of Maeve, half-sisters, battle one another for control of England. Sequel to Oberon's Legacy.</description> </book> <book id="bk106"> <author>Randall, Cynthia</author> <title>Lover Birds</title> <genre>Romance</genre> <price>4.95</price> <publish_date>2000-09-02</publish_date> <description>When Carla meets Paul at an ornithology conference, tempers fly as feathers get ruffled.</description> </book> <book id="bk107"> <author>Thurman, Paula</author> <title>Splish Splash</title> <genre>Romance</genre> <price>4.95</price> <publish_date>2000-11-02</publish_date> <description>A deep sea diver finds true love twenty thousand leagues beneath the sea.</description> </book> <book id="bk108"> <author>Knorr, Stefan</author> <title>Creepy Crawlies</title> <genre>Horror</genre> <price>4.95</price> <publish_date>2000-12-06</publish_date> <description>An anthology of horror stories about roaches, centipedes, scorpions and other insects.</description> </book> <book id="bk109"> <author>Kress, Peter</author> <title>Paradox Lost</title> <genre>Science Fiction</genre> <price>6.95</price> <publish_date>2000-11-02</publish_date> <description>After an inadvertant trip through a Heisenberg Uncertainty Device, James Salway discovers the problems of being quantum.</description> </book> <book id="bk110"> <author>O'Brien, Tim</author> <title>Microsoft .NET: The Programming Bible</title> <genre>Computer</genre> <price>36.95</price> <publish_date>2000-12-09</publish_date> <description>Microsoft's .NET initiative is explored in detail in this deep programmer's reference.</description> </book> <book id="bk111"> <author>O'Brien, Tim</author> <title>MSXML3: A Comprehensive Guide</title> <genre>Computer</genre> <price>36.95</price> <publish_date>2000-12-01</publish_date> <description>The Microsoft MSXML3 parser is covered in detail, with attention to XML DOM interfaces, XSLT processing, SAX and more.</description> </book> <book id="bk112"> <author>Galos, Mike</author> <title>Visual Studio 7: A Comprehensive Guide</title> <genre>Computer</genre> <price>49.95</price> <publish_date>2001-04-16</publish_date> <description>Microsoft Visual Studio 7 is explored in depth, looking at how Visual Basic, Visual C++, C#, and ASP+ are integrated into a comprehensive development environment.</description> </book> </catalog>
Reply
#2
Hi..

You have:
 if (attr.tag==author'):
...but looks like you left off a single quote before author? should be:
 if (attr.tag=='author'):
Reply
#3
take a look at w3schoold XML tutorial here:
The 'Books.xml' file looks almost identical: https://www.w3schools.com/xml/xml_usedfor.asp
Reply
#4
It still give error,
Below is my code:

[python][python]import xml.etree.cElementTree as ET
tree = ET.ElementTree('input.xml')
root = tree.getroot()
for books in root:
    if (books.tag=='book'):
        print books.get('id') % here it says invalid syntax
        for attr in books:
            if (attr.tag=='author'):
                print (attr.text)
[/python][/python]
Reply
#5
Hi

I changed your code to the below. I used the "parse" method/function to parse the xml file into the "tree" object. Also, I put the source file in my temp folder..but you can change that back to the location you were running from.

import xml.etree.cElementTree as ET
source_file = 'C:\Temp\input.xml'
tree = ET.parse(source_file)
root = tree.getroot()
for books in root:
    if (books.tag=='book'):
        print(books.get('id'))
        for attr in books:
            if (attr.tag=='author'):
                print (attr.text)
The output I get is:
Output:
bk101 Gambardella, Matthew bk102 Ralls, Kim bk103 Corets, Eva bk104 Corets, Eva bk105 Corets, Eva bk106 Randall, Cynthia bk107 Thurman, Paula bk108 Knorr, Stefan bk109 Kress, Peter bk110 O'Brien, Tim bk111 O'Brien, Tim bk112 Galos, Mike
Reply
#6
# -*- coding: utf-8 -*-
"""
Created on Fri Apr 13 20:33:17 2018

@author: user
"""

#from xml.dom import minidom
#
#doc = minidom.parse("D:\Mekala_Backupdata\PythonCodes\input.xml")

 
#import xml.etree.cElementTree as ET
#tree = ET.ElementTree('D:\Mekala_Backupdata\PythonCodes\input.xml')
#root = tree.getroot()
#for books in root:
#    if (books.tag=='book'):
#        print books.get('id')
#        for attr in books:
#            if (attr.tag=='author'):
#                print (attr.text)
                
import xml.etree.cElementTree as ET
source_file = 'D:\Mekala_Backupdata\PythonCodes\input.xml'
tree = ET.parse(source_file)
root = tree.getroot()
for books in root:
    if (books.tag=='book'):
        print(books.get('id'))
        for attr in books:
            if (attr.tag=='author' or attr.tag=='title' or attr.tag=='price'):
                print (attr.text)
the code is workig,
But in tilte part, I only want first part before ":"

for example: in my above xml file the last title is:
Visual Studio 7: A Comprehensive Guide
I only want upto "Visual Studio 7" (or if I want to take the part after ":" i.e A Comprehensive Guide)

kinldy help how to do this
Reply
#7
(Apr-14-2018, 10:31 AM)Raj Wrote: for example: in my above xml file the last title is:
Visual Studio 7: A Comprehensive Guide
I only want upto "Visual Studio 7" (or if I want to take the part after ":" i.e A Comprehensive Guide)
Split it up,can also make a dicionarey of it.
Here also a alternative way with BeautifulSoup or lxml these are just better and more updated parser than parses in standard library.
Also easier to use.
from bs4 import BeautifulSoup

soup = BeautifulSoup(open(r'C:\1_py\my.xml'), 'xml')
book = soup.find('book', id="bk112")
title = book.title.text
print(title)

# Make dictionray
lst = title.split(':')
d = dict([lst])
Test:
>>> Visual Studio 7: A Comprehensive Guide

>>> d
{'Visual Studio 7': ' A Comprehensive Guide'}
>>> d['Visual Studio 7']
' A Comprehensive Guide'

>>> d.keys()
dict_keys(['Visual Studio 7'])
>>> d.values()
dict_values([' A Comprehensive Guide'])
Reply
#8
which command I should write after
if (attr.tag=='author' or attr.tag=='title' or attr.tag=='price'):
print (attr.text)

Not just one tiltle, but in all titles, I wish only to take the first part of it (before :)
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Open and read a tab delimited file from html using python cgi luffy 2 2,633 Aug-24-2020, 06:25 AM
Last Post: luffy
  Read owl file using python flask Gayathri 1 2,396 Nov-20-2019, 12:56 PM
Last Post: ChislaineWijdeven
  Read XML-File yuyu 16 6,980 Dec-15-2018, 10:49 PM
Last Post: snippsat
  Read input file and print hyperlinks Emmanouil 8 15,054 Oct-23-2016, 07:26 PM
Last Post: snippsat

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020