Python Forum

Full Version: Read XML-File
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2
Hello Forum,

I've got a XML-File with al lot of names, age and a number as follows:

## inputfile ##
<Name>Tim</Name>
<Age>23</Age>
<Number>1234</Number>
<Name>Jenny</Name>
<Age>23</Age>
<Number>4321</Number>
## inputfile ##

How can I read the XML-File and write the output into a new textfile:

## outputfile ##
Name: Tim
Age: 23
Number: 1234

Name: Jenny
Age: 23
Number: 4321
## outputfile ##

Thanks a lot for help!
Show what you have tried so far.
Hello,

I tried a little bit and found this:
import xml.etree.ElementTree as ET
tree = ET.parse('c:/data.xml')
root = tree.getroot()

But how can I sarch now for e.g. all Names and print them into a new file?

<data>
<Name>Tim</Name>
<Age>23</Age>
<Number>1234</Number>
<Name>Jenny</Name>
<Age>23</Age>
<Number>4321</Number>
</data>
you need to learn how to parse an XML file.
I would suggest the following tutorial: https://lxml.de/parsing.html
In that tutorial, search for: "The lxml.etree Tutorial"
do you know a description, which is closer to the example?
Quote:do you know a description, which is closer to the example

you will find some examples of using lxml etree in Snippsatt's web scraping tutorials here:
part1
part2

Note that this will be for html, but the process for XML and HTML are very similar.

Remember, XML is a Markup Language and learning how to use it is not as simple as reading a description.
There's two ways to parse xml. If your file is huge or streaming (so you don't have the whole document when starting), SAX is the faster and more memory efficient way. If none of those things are a concern, then DOM is easier to work with. DOM is slower and uses more memory, because it parses the whole document before you can interact with it, while SAX is event based and you work with it as it's being parsed.

Also, your sample file isn't valid xml, as there isn't a root node. For this example, I've pretended that you actually do have a root node of "data":
>>> document = '''<data><Name>Tim</Name>
... <Age>23</Age>
... <Number>1234</Number>
... <Name>Jenny</Name>
... <Age>23</Age>
... <Number>4321</Number></data>'''
>>> from xml.dom import minidom
>>> doc = minidom.parseString(document)
>>> for node in doc.getElementsByTagName("Name"):
...   text_node = node.childNodes[0]
...   print(text_node.nodeValue)
...
Tim
Jenny
But BeautifulSoup (previously mentioned in this thread) is much easier to work with:
>>> from bs4 import BeautifulSoup as bs
>>> soup = bs(document)
>>> for node in soup.find_all("name"):
...   print(node.text)
...
Tim
Jenny
Minidom works fine for printing names, but I cannot get all 3 attributes out of my inputfile with nested forloops, what might be the issue?
If you look at my answer in your other thread,you can use that code.
I see no point in using Minidom Dodgy
from bs4 import BeautifulSoup

data = '''\
<Name>Tim</Name>
<Age>23</Age>
<Number>1234</Number>
<Name>Jenny</Name>
<Age>23</Age>
<Number>4321</Number>'''

soup = BeautifulSoup(data, 'lxml')
for item in soup.find_all(['name', 'age', 'number']):
    print(f'{item.name.capitalize()}:{item.text}')
Output:
Name:Tim Age:23 Number:1234 Name:Jenny Age:23 Number:4321
(Dec-14-2018, 10:10 PM)yuyu Wrote: [ -> ]Minidom works fine for printing names, but I cannot get all 3 attributes out of my inputfile with nested forloops, what might be the issue?
None of the xml you've shown so far has had any attributes at all. And finding the node values works with any node name. So if it's not working for you, you'll need to share your code, and the input file that isn't working right.
Pages: 1 2