Python Forum
Read XML-File - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html)
+--- Thread: Read XML-File (/thread-14697.html)

Pages: 1 2


Read XML-File - yuyu - Dec-12-2018

Hello Forum,

I've got a XML-File with al lot of names, age and a number as follows:

## inputfile ##
<Name>Tim</Name>
<Age>23</Age>
<Number>1234</Number>
<Name>Jenny</Name>
<Age>23</Age>
<Number>4321</Number>
## inputfile ##

How can I read the XML-File and write the output into a new textfile:

## outputfile ##
Name: Tim
Age: 23
Number: 1234

Name: Jenny
Age: 23
Number: 4321
## outputfile ##

Thanks a lot for help!


RE: Read XML-File - Larz60+ - Dec-12-2018

Show what you have tried so far.


RE: Read XML-File - yuyu - Dec-13-2018

Hello,

I tried a little bit and found this:
import xml.etree.ElementTree as ET
tree = ET.parse('c:/data.xml')
root = tree.getroot()

But how can I sarch now for e.g. all Names and print them into a new file?

<data>
<Name>Tim</Name>
<Age>23</Age>
<Number>1234</Number>
<Name>Jenny</Name>
<Age>23</Age>
<Number>4321</Number>
</data>


RE: Read XML-File - Larz60+ - Dec-13-2018

you need to learn how to parse an XML file.
I would suggest the following tutorial: https://lxml.de/parsing.html
In that tutorial, search for: "The lxml.etree Tutorial"


RE: Read XML-File - yuyu - Dec-13-2018

do you know a description, which is closer to the example?


RE: Read XML-File - Larz60+ - Dec-13-2018

Quote:do you know a description, which is closer to the example

you will find some examples of using lxml etree in Snippsatt's web scraping tutorials here:
part1
part2

Note that this will be for html, but the process for XML and HTML are very similar.

Remember, XML is a Markup Language and learning how to use it is not as simple as reading a description.


RE: Read XML-File - nilamo - Dec-13-2018

There's two ways to parse xml. If your file is huge or streaming (so you don't have the whole document when starting), SAX is the faster and more memory efficient way. If none of those things are a concern, then DOM is easier to work with. DOM is slower and uses more memory, because it parses the whole document before you can interact with it, while SAX is event based and you work with it as it's being parsed.

Also, your sample file isn't valid xml, as there isn't a root node. For this example, I've pretended that you actually do have a root node of "data":
>>> document = '''<data><Name>Tim</Name>
... <Age>23</Age>
... <Number>1234</Number>
... <Name>Jenny</Name>
... <Age>23</Age>
... <Number>4321</Number></data>'''
>>> from xml.dom import minidom
>>> doc = minidom.parseString(document)
>>> for node in doc.getElementsByTagName("Name"):
...   text_node = node.childNodes[0]
...   print(text_node.nodeValue)
...
Tim
Jenny
But BeautifulSoup (previously mentioned in this thread) is much easier to work with:
>>> from bs4 import BeautifulSoup as bs
>>> soup = bs(document)
>>> for node in soup.find_all("name"):
...   print(node.text)
...
Tim
Jenny



RE: Read XML-File - yuyu - Dec-14-2018

Minidom works fine for printing names, but I cannot get all 3 attributes out of my inputfile with nested forloops, what might be the issue?


RE: Read XML-File - snippsat - Dec-14-2018

If you look at my answer in your other thread,you can use that code.
I see no point in using Minidom Dodgy
from bs4 import BeautifulSoup

data = '''\
<Name>Tim</Name>
<Age>23</Age>
<Number>1234</Number>
<Name>Jenny</Name>
<Age>23</Age>
<Number>4321</Number>'''

soup = BeautifulSoup(data, 'lxml')
for item in soup.find_all(['name', 'age', 'number']):
    print(f'{item.name.capitalize()}:{item.text}')
Output:
Name:Tim Age:23 Number:1234 Name:Jenny Age:23 Number:4321



RE: Read XML-File - nilamo - Dec-15-2018

(Dec-14-2018, 10:10 PM)yuyu Wrote: Minidom works fine for printing names, but I cannot get all 3 attributes out of my inputfile with nested forloops, what might be the issue?
None of the xml you've shown so far has had any attributes at all. And finding the node values works with any node name. So if it's not working for you, you'll need to share your code, and the input file that isn't working right.