Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Read XML-File
#1
Hello Forum,

I've got a XML-File with al lot of names, age and a number as follows:

## inputfile ##
<Name>Tim</Name>
<Age>23</Age>
<Number>1234</Number>
<Name>Jenny</Name>
<Age>23</Age>
<Number>4321</Number>
## inputfile ##

How can I read the XML-File and write the output into a new textfile:

## outputfile ##
Name: Tim
Age: 23
Number: 1234

Name: Jenny
Age: 23
Number: 4321
## outputfile ##

Thanks a lot for help!
Reply
#2
Show what you have tried so far.
Reply
#3
Hello,

I tried a little bit and found this:
import xml.etree.ElementTree as ET
tree = ET.parse('c:/data.xml')
root = tree.getroot()

But how can I sarch now for e.g. all Names and print them into a new file?

<data>
<Name>Tim</Name>
<Age>23</Age>
<Number>1234</Number>
<Name>Jenny</Name>
<Age>23</Age>
<Number>4321</Number>
</data>
Reply
#4
you need to learn how to parse an XML file.
I would suggest the following tutorial: https://lxml.de/parsing.html
In that tutorial, search for: "The lxml.etree Tutorial"
Reply
#5
do you know a description, which is closer to the example?
Reply
#6
Quote:do you know a description, which is closer to the example

you will find some examples of using lxml etree in Snippsatt's web scraping tutorials here:
part1
part2

Note that this will be for html, but the process for XML and HTML are very similar.

Remember, XML is a Markup Language and learning how to use it is not as simple as reading a description.
Reply
#7
There's two ways to parse xml. If your file is huge or streaming (so you don't have the whole document when starting), SAX is the faster and more memory efficient way. If none of those things are a concern, then DOM is easier to work with. DOM is slower and uses more memory, because it parses the whole document before you can interact with it, while SAX is event based and you work with it as it's being parsed.

Also, your sample file isn't valid xml, as there isn't a root node. For this example, I've pretended that you actually do have a root node of "data":
>>> document = '''<data><Name>Tim</Name>
... <Age>23</Age>
... <Number>1234</Number>
... <Name>Jenny</Name>
... <Age>23</Age>
... <Number>4321</Number></data>'''
>>> from xml.dom import minidom
>>> doc = minidom.parseString(document)
>>> for node in doc.getElementsByTagName("Name"):
...   text_node = node.childNodes[0]
...   print(text_node.nodeValue)
...
Tim
Jenny
But BeautifulSoup (previously mentioned in this thread) is much easier to work with:
>>> from bs4 import BeautifulSoup as bs
>>> soup = bs(document)
>>> for node in soup.find_all("name"):
...   print(node.text)
...
Tim
Jenny
Reply
#8
Minidom works fine for printing names, but I cannot get all 3 attributes out of my inputfile with nested forloops, what might be the issue?
Reply
#9
If you look at my answer in your other thread,you can use that code.
I see no point in using Minidom Dodgy
from bs4 import BeautifulSoup

data = '''\
<Name>Tim</Name>
<Age>23</Age>
<Number>1234</Number>
<Name>Jenny</Name>
<Age>23</Age>
<Number>4321</Number>'''

soup = BeautifulSoup(data, 'lxml')
for item in soup.find_all(['name', 'age', 'number']):
    print(f'{item.name.capitalize()}:{item.text}')
Output:
Name:Tim Age:23 Number:1234 Name:Jenny Age:23 Number:4321
Reply
#10
(Dec-14-2018, 10:10 PM)yuyu Wrote: Minidom works fine for printing names, but I cannot get all 3 attributes out of my inputfile with nested forloops, what might be the issue?
None of the xml you've shown so far has had any attributes at all. And finding the node values works with any node name. So if it's not working for you, you'll need to share your code, and the input file that isn't working right.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Open and read a tab delimited file from html using python cgi luffy 2 2,633 Aug-24-2020, 06:25 AM
Last Post: luffy
  Read owl file using python flask Gayathri 1 2,395 Nov-20-2019, 12:56 PM
Last Post: ChislaineWijdeven
  how to read data from xml file Raj 7 5,181 Apr-14-2018, 12:14 PM
Last Post: Raj
  Read input file and print hyperlinks Emmanouil 8 15,053 Oct-23-2016, 07:26 PM
Last Post: snippsat

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020