Python Forum

Pages: 1 2

BeautifulSoup works fine for attached example, bt unfortunately not on a big XML with different grouped XML-Tags for Name, Age and Number which are the relevant text values. Is this somehow possible for the following example to read Name, Age and Number and write it to a new file?

<data>
<friends>
<human>
<Name>Tim</Name>
<Age>23</Age>
<Number>1234</Number>
</human>
<human>
<Name>Jenny</Name>
<Age>23</Age>
<Number>4321</Number>
</human>
<animal>
<Name>Wuff</Name>
<Age>4</Age>
<Number>2323</Number>
</animal>
</friends>
</data>

Thanks a lot.

(Dec-15-2018, 01:18 PM)yuyu Wrote: [ -> ]Is this somehow possible for the following example to read Name, Age and Number and write it to a new file?

Yes of course,you should try yourself Undecided

It's not so common to parse a whole XML file to text with tags names.

Use BBCode code tag,and a XML file usually have have indentation then is easier to the see structure.
Example:

<?xml version="1.0" encoding="UTF-8"?>
<data>
  <friends>
    <human>
      <Name>Tim</Name>
      <Age>23</Age>
      <Number>1234</Number>
    </human>
    <human>
      <Name>Jenny</Name>
      <Age>23</Age>
      <Number>4321</Number>
    </human>
    <animal>
      <Name>Wuff</Name>
      <Age>4</Age>
      <Number>2323</Number>
    </animal>
  </friends>
</data>

So it a read this file and do some test.

from bs4 import BeautifulSoup

soup = BeautifulSoup(open('test.xml'), 'lxml')

Test:

>>> for tag in soup.find_all('human'):
...     for item in tag.find_all(['name', 'age', 'number']):
...         print(item)
...         
<name>Tim</name>
<age>23</age>
<number>1234</number>
<name>Jenny</name>
<age>23</age>
<number>4321</number>

This get all under human,so if use friends will get both human/animal.

>>> for tag in soup.find_all('friends'):
...     for item in tag.find_all(['name', 'age', 'number']):
...         print(item)
...         
<name>Tim</name>
<age>23</age>
<number>1234</number>
<name>Jenny</name>
<age>23</age>
<number>4321</number>
<name>Wuff</name>
<age>4</age>
<number>2323</number>

So can look like this,write to file you can try yourself.

from bs4 import BeautifulSoup

soup = BeautifulSoup(open('test.xml'), 'lxml')
for tag in soup.find_all('friends'):
    for item in tag.find_all(['name', 'age', 'number']):
        print(f'{item.name.capitalize()}:{item.text}')

Output:Name:Tim
Age:23
Number:1234
Name:Jenny
Age:23
Number:4321
Name:Wuff
Age:4
Number:2323

When I adapt and apply the last script for another big XML-File, then it's not coming over the firsts loop?

Yes for a bigger file you will probably need to make changes,so try that i don't know how it look Wink

Thanks, but I was wondering me if there is a more generic way?

Is it possible to use a regex instead of 'human' 'animal', because in real file are blocks with Tags like <Param-One-Block> and <Param-Two-Block>. Means a RegEx for One and Two in middle, because the rest is always the same.

Quote:for tag in soup.find_all('friends'):

Algo:
Search for block with Regex and retrieve same subparameter (here Name, Age, Number) until EOF. Is this somehow possible?

<?xml version="1.0" encoding="UTF-8"?>
<data>
  <friends>
    <Param-One-Block>
      <Name>Tim</Name>
      <Age>23</Age>
      <Number>1234</Number>
    </Param-One-Block>
    <Param-One-Block>
      <Name>Jenny</Name>
      <Age>23</Age>
      <Number>4321</Number>
    </Param-One-Block>
    <Param-Two-Block>
      <Name>Wuff</Name>
      <Age>4</Age>
      <Number>2323</Number>
    </Param-Two-Block>
  </friends>
</data>

yuyu Wrote:Thanks, but I was wondering me if there is a more generic way?
Search for block with Regex and retrieve same subparameter (here Name, Age, Number) until EOF. Is this somehow possible?

If you need all name,age and number trough the file,then find_all() with those parameter.
test1.xml your last code.

from bs4 import BeautifulSoup

soup = BeautifulSoup(open('test1.xml'), 'lxml')
for item in soup.find_all(['name', 'age', 'number']):
    print(f'{item.name.capitalize()}:{item.text}')

Output:Name:Tim
Age:23
Number:1234
Name:Jenny
Age:23
Number:4321
Name:Wuff
Age:4
Number:2323

Pages: 1 2

yuyu

snippsat

yuyu

snippsat

yuyu

yuyu

snippsat