Read XML-File - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: Read XML-File (/thread-14697.html) Pages:
1
2
|
Read XML-File - yuyu - Dec-12-2018 Hello Forum, I've got a XML-File with al lot of names, age and a number as follows: ## inputfile ## <Name>Tim</Name> <Age>23</Age> <Number>1234</Number> <Name>Jenny</Name> <Age>23</Age> <Number>4321</Number> ## inputfile ## How can I read the XML-File and write the output into a new textfile: ## outputfile ## Name: Tim Age: 23 Number: 1234 Name: Jenny Age: 23 Number: 4321 ## outputfile ## Thanks a lot for help! RE: Read XML-File - Larz60+ - Dec-12-2018 Show what you have tried so far. RE: Read XML-File - yuyu - Dec-13-2018 Hello, I tried a little bit and found this: import xml.etree.ElementTree as ET tree = ET.parse('c:/data.xml') root = tree.getroot() But how can I sarch now for e.g. all Names and print them into a new file? <data> <Name>Tim</Name> <Age>23</Age> <Number>1234</Number> <Name>Jenny</Name> <Age>23</Age> <Number>4321</Number> </data> RE: Read XML-File - Larz60+ - Dec-13-2018 you need to learn how to parse an XML file. I would suggest the following tutorial: https://lxml.de/parsing.html In that tutorial, search for: "The lxml.etree Tutorial" RE: Read XML-File - yuyu - Dec-13-2018 do you know a description, which is closer to the example? RE: Read XML-File - Larz60+ - Dec-13-2018 Quote:do you know a description, which is closer to the example you will find some examples of using lxml etree in Snippsatt's web scraping tutorials here: part1 part2 Note that this will be for html, but the process for XML and HTML are very similar. Remember, XML is a Markup Language and learning how to use it is not as simple as reading a description. RE: Read XML-File - nilamo - Dec-13-2018 There's two ways to parse xml. If your file is huge or streaming (so you don't have the whole document when starting), SAX is the faster and more memory efficient way. If none of those things are a concern, then DOM is easier to work with. DOM is slower and uses more memory, because it parses the whole document before you can interact with it, while SAX is event based and you work with it as it's being parsed. Also, your sample file isn't valid xml, as there isn't a root node. For this example, I've pretended that you actually do have a root node of "data": >>> document = '''<data><Name>Tim</Name> ... <Age>23</Age> ... <Number>1234</Number> ... <Name>Jenny</Name> ... <Age>23</Age> ... <Number>4321</Number></data>''' >>> from xml.dom import minidom >>> doc = minidom.parseString(document) >>> for node in doc.getElementsByTagName("Name"): ... text_node = node.childNodes[0] ... print(text_node.nodeValue) ... Tim JennyBut BeautifulSoup (previously mentioned in this thread) is much easier to work with: >>> from bs4 import BeautifulSoup as bs >>> soup = bs(document) >>> for node in soup.find_all("name"): ... print(node.text) ... Tim Jenny RE: Read XML-File - yuyu - Dec-14-2018 Minidom works fine for printing names, but I cannot get all 3 attributes out of my inputfile with nested forloops, what might be the issue? RE: Read XML-File - snippsat - Dec-14-2018 If you look at my answer in your other thread,you can use that code. I see no point in using Minidom from bs4 import BeautifulSoup data = '''\ <Name>Tim</Name> <Age>23</Age> <Number>1234</Number> <Name>Jenny</Name> <Age>23</Age> <Number>4321</Number>''' soup = BeautifulSoup(data, 'lxml') for item in soup.find_all(['name', 'age', 'number']): print(f'{item.name.capitalize()}:{item.text}')
RE: Read XML-File - nilamo - Dec-15-2018 (Dec-14-2018, 10:10 PM)yuyu Wrote: Minidom works fine for printing names, but I cannot get all 3 attributes out of my inputfile with nested forloops, what might be the issue?None of the xml you've shown so far has had any attributes at all. And finding the node values works with any node name. So if it's not working for you, you'll need to share your code, and the input file that isn't working right. |