Python Forum

Full Version: Read XML-File
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2
BeautifulSoup works fine for attached example, bt unfortunately not on a big XML with different grouped XML-Tags for Name, Age and Number which are the relevant text values. Is this somehow possible for the following example to read Name, Age and Number and write it to a new file?

<data>
<friends>
<human>
<Name>Tim</Name>
<Age>23</Age>
<Number>1234</Number>
</human>
<human>
<Name>Jenny</Name>
<Age>23</Age>
<Number>4321</Number>
</human>
<animal>
<Name>Wuff</Name>
<Age>4</Age>
<Number>2323</Number>
</animal>
</friends>
</data>

Thanks a lot.
(Dec-15-2018, 01:18 PM)yuyu Wrote: [ -> ]Is this somehow possible for the following example to read Name, Age and Number and write it to a new file?
Yes of course,you should try yourself Undecided
It's not so common to parse a whole XML file to text with tags names.

Use BBCode code tag,and a XML file usually have have indentation then is easier to the see structure.
Example:
<?xml version="1.0" encoding="UTF-8"?>
<data>
  <friends>
    <human>
      <Name>Tim</Name>
      <Age>23</Age>
      <Number>1234</Number>
    </human>
    <human>
      <Name>Jenny</Name>
      <Age>23</Age>
      <Number>4321</Number>
    </human>
    <animal>
      <Name>Wuff</Name>
      <Age>4</Age>
      <Number>2323</Number>
    </animal>
  </friends>
</data>
So it a read this file and do some test.
from bs4 import BeautifulSoup

soup = BeautifulSoup(open('test.xml'), 'lxml')
Test:
>>> for tag in soup.find_all('human'):
...     for item in tag.find_all(['name', 'age', 'number']):
...         print(item)
...         
<name>Tim</name>
<age>23</age>
<number>1234</number>
<name>Jenny</name>
<age>23</age>
<number>4321</number>
This get all under human,so if use friends will get both human/animal.
>>> for tag in soup.find_all('friends'):
...     for item in tag.find_all(['name', 'age', 'number']):
...         print(item)
...         
<name>Tim</name>
<age>23</age>
<number>1234</number>
<name>Jenny</name>
<age>23</age>
<number>4321</number>
<name>Wuff</name>
<age>4</age>
<number>2323</number>
So can look like this,write to file you can try yourself.
from bs4 import BeautifulSoup

soup = BeautifulSoup(open('test.xml'), 'lxml')
for tag in soup.find_all('friends'):
    for item in tag.find_all(['name', 'age', 'number']):
        print(f'{item.name.capitalize()}:{item.text}')
Output:
Name:Tim Age:23 Number:1234 Name:Jenny Age:23 Number:4321 Name:Wuff Age:4 Number:2323
When I adapt and apply the last script for another big XML-File, then it's not coming over the firsts loop?
Yes for a bigger file you will probably need to make changes,so try that i don't know how it look Wink
Thanks, but I was wondering me if there is a more generic way?
Is it possible to use a regex instead of 'human' 'animal', because in real file are blocks with Tags like <Param-One-Block> and <Param-Two-Block>. Means a RegEx for One and Two in middle, because the rest is always the same.

Quote:for tag in soup.find_all('friends'):

Algo:
Search for block with Regex and retrieve same subparameter (here Name, Age, Number) until EOF. Is this somehow possible?

<?xml version="1.0" encoding="UTF-8"?>
<data>
  <friends>
    <Param-One-Block>
      <Name>Tim</Name>
      <Age>23</Age>
      <Number>1234</Number>
    </Param-One-Block>
    <Param-One-Block>
      <Name>Jenny</Name>
      <Age>23</Age>
      <Number>4321</Number>
    </Param-One-Block>
    <Param-Two-Block>
      <Name>Wuff</Name>
      <Age>4</Age>
      <Number>2323</Number>
    </Param-Two-Block>
  </friends>
</data>
yuyu Wrote:Thanks, but I was wondering me if there is a more generic way?
Search for block with Regex and retrieve same subparameter (here Name, Age, Number) until EOF. Is this somehow possible?
If you need all name,age and number trough the file,then find_all() with those parameter.
test1.xml your last code.
from bs4 import BeautifulSoup

soup = BeautifulSoup(open('test1.xml'), 'lxml')
for item in soup.find_all(['name', 'age', 'number']):
    print(f'{item.name.capitalize()}:{item.text}')
Output:
Name:Tim Age:23 Number:1234 Name:Jenny Age:23 Number:4321 Name:Wuff Age:4 Number:2323
Pages: 1 2