Read XML-File - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: Read XML-File (/thread-14697.html) Pages:
1
2
|
RE: Read XML-File - yuyu - Dec-15-2018 BeautifulSoup works fine for attached example, bt unfortunately not on a big XML with different grouped XML-Tags for Name, Age and Number which are the relevant text values. Is this somehow possible for the following example to read Name, Age and Number and write it to a new file? <data> <friends> <human> <Name>Tim</Name> <Age>23</Age> <Number>1234</Number> </human> <human> <Name>Jenny</Name> <Age>23</Age> <Number>4321</Number> </human> <animal> <Name>Wuff</Name> <Age>4</Age> <Number>2323</Number> </animal> </friends> </data> Thanks a lot. RE: Read XML-File - snippsat - Dec-15-2018 (Dec-15-2018, 01:18 PM)yuyu Wrote: Is this somehow possible for the following example to read Name, Age and Number and write it to a new file?Yes of course,you should try yourself It's not so common to parse a whole XML file to text with tags names. Use BBCode code tag,and a XML file usually have have indentation then is easier to the see structure. Example: <?xml version="1.0" encoding="UTF-8"?> <data> <friends> <human> <Name>Tim</Name> <Age>23</Age> <Number>1234</Number> </human> <human> <Name>Jenny</Name> <Age>23</Age> <Number>4321</Number> </human> <animal> <Name>Wuff</Name> <Age>4</Age> <Number>2323</Number> </animal> </friends> </data>So it a read this file and do some test. from bs4 import BeautifulSoup soup = BeautifulSoup(open('test.xml'), 'lxml')Test: >>> for tag in soup.find_all('human'): ... for item in tag.find_all(['name', 'age', 'number']): ... print(item) ... <name>Tim</name> <age>23</age> <number>1234</number> <name>Jenny</name> <age>23</age> <number>4321</number>This get all under human,so if use friends will get both human/animal.>>> for tag in soup.find_all('friends'): ... for item in tag.find_all(['name', 'age', 'number']): ... print(item) ... <name>Tim</name> <age>23</age> <number>1234</number> <name>Jenny</name> <age>23</age> <number>4321</number> <name>Wuff</name> <age>4</age> <number>2323</number>So can look like this,write to file you can try yourself. from bs4 import BeautifulSoup soup = BeautifulSoup(open('test.xml'), 'lxml') for tag in soup.find_all('friends'): for item in tag.find_all(['name', 'age', 'number']): print(f'{item.name.capitalize()}:{item.text}')
RE: Read XML-File - yuyu - Dec-15-2018 When I adapt and apply the last script for another big XML-File, then it's not coming over the firsts loop? RE: Read XML-File - snippsat - Dec-15-2018 Yes for a bigger file you will probably need to make changes,so try that i don't know how it look RE: Read XML-File - yuyu - Dec-15-2018 Thanks, but I was wondering me if there is a more generic way? RE: Read XML-File - yuyu - Dec-15-2018 Is it possible to use a regex instead of 'human' 'animal', because in real file are blocks with Tags like <Param-One-Block> and <Param-Two-Block>. Means a RegEx for One and Two in middle, because the rest is always the same. Quote:for tag in soup.find_all('friends'): Algo: Search for block with Regex and retrieve same subparameter (here Name, Age, Number) until EOF. Is this somehow possible? <?xml version="1.0" encoding="UTF-8"?> <data> <friends> <Param-One-Block> <Name>Tim</Name> <Age>23</Age> <Number>1234</Number> </Param-One-Block> <Param-One-Block> <Name>Jenny</Name> <Age>23</Age> <Number>4321</Number> </Param-One-Block> <Param-Two-Block> <Name>Wuff</Name> <Age>4</Age> <Number>2323</Number> </Param-Two-Block> </friends> </data> RE: Read XML-File - snippsat - Dec-15-2018 yuyu Wrote:Thanks, but I was wondering me if there is a more generic way?If you need all name,age and number trough the file,then find_all() with those parameter.test1.xml your last code.from bs4 import BeautifulSoup soup = BeautifulSoup(open('test1.xml'), 'lxml') for item in soup.find_all(['name', 'age', 'number']): print(f'{item.name.capitalize()}:{item.text}')
|