Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Read XML-File
#11
BeautifulSoup works fine for attached example, bt unfortunately not on a big XML with different grouped XML-Tags for Name, Age and Number which are the relevant text values. Is this somehow possible for the following example to read Name, Age and Number and write it to a new file?

<data>
<friends>
<human>
<Name>Tim</Name>
<Age>23</Age>
<Number>1234</Number>
</human>
<human>
<Name>Jenny</Name>
<Age>23</Age>
<Number>4321</Number>
</human>
<animal>
<Name>Wuff</Name>
<Age>4</Age>
<Number>2323</Number>
</animal>
</friends>
</data>

Thanks a lot.
Reply
#12
(Dec-15-2018, 01:18 PM)yuyu Wrote: Is this somehow possible for the following example to read Name, Age and Number and write it to a new file?
Yes of course,you should try yourself Undecided
It's not so common to parse a whole XML file to text with tags names.

Use BBCode code tag,and a XML file usually have have indentation then is easier to the see structure.
Example:
<?xml version="1.0" encoding="UTF-8"?>
<data>
  <friends>
    <human>
      <Name>Tim</Name>
      <Age>23</Age>
      <Number>1234</Number>
    </human>
    <human>
      <Name>Jenny</Name>
      <Age>23</Age>
      <Number>4321</Number>
    </human>
    <animal>
      <Name>Wuff</Name>
      <Age>4</Age>
      <Number>2323</Number>
    </animal>
  </friends>
</data>
So it a read this file and do some test.
from bs4 import BeautifulSoup

soup = BeautifulSoup(open('test.xml'), 'lxml')
Test:
>>> for tag in soup.find_all('human'):
...     for item in tag.find_all(['name', 'age', 'number']):
...         print(item)
...         
<name>Tim</name>
<age>23</age>
<number>1234</number>
<name>Jenny</name>
<age>23</age>
<number>4321</number>
This get all under human,so if use friends will get both human/animal.
>>> for tag in soup.find_all('friends'):
...     for item in tag.find_all(['name', 'age', 'number']):
...         print(item)
...         
<name>Tim</name>
<age>23</age>
<number>1234</number>
<name>Jenny</name>
<age>23</age>
<number>4321</number>
<name>Wuff</name>
<age>4</age>
<number>2323</number>
So can look like this,write to file you can try yourself.
from bs4 import BeautifulSoup

soup = BeautifulSoup(open('test.xml'), 'lxml')
for tag in soup.find_all('friends'):
    for item in tag.find_all(['name', 'age', 'number']):
        print(f'{item.name.capitalize()}:{item.text}')
Output:
Name:Tim Age:23 Number:1234 Name:Jenny Age:23 Number:4321 Name:Wuff Age:4 Number:2323
Reply
#13
When I adapt and apply the last script for another big XML-File, then it's not coming over the firsts loop?
Reply
#14
Yes for a bigger file you will probably need to make changes,so try that i don't know how it look Wink
Reply
#15
Thanks, but I was wondering me if there is a more generic way?
Reply
#16
Is it possible to use a regex instead of 'human' 'animal', because in real file are blocks with Tags like <Param-One-Block> and <Param-Two-Block>. Means a RegEx for One and Two in middle, because the rest is always the same.

Quote:for tag in soup.find_all('friends'):

Algo:
Search for block with Regex and retrieve same subparameter (here Name, Age, Number) until EOF. Is this somehow possible?

<?xml version="1.0" encoding="UTF-8"?>
<data>
  <friends>
    <Param-One-Block>
      <Name>Tim</Name>
      <Age>23</Age>
      <Number>1234</Number>
    </Param-One-Block>
    <Param-One-Block>
      <Name>Jenny</Name>
      <Age>23</Age>
      <Number>4321</Number>
    </Param-One-Block>
    <Param-Two-Block>
      <Name>Wuff</Name>
      <Age>4</Age>
      <Number>2323</Number>
    </Param-Two-Block>
  </friends>
</data>
Reply
#17
yuyu Wrote:Thanks, but I was wondering me if there is a more generic way?
Search for block with Regex and retrieve same subparameter (here Name, Age, Number) until EOF. Is this somehow possible?
If you need all name,age and number trough the file,then find_all() with those parameter.
test1.xml your last code.
from bs4 import BeautifulSoup

soup = BeautifulSoup(open('test1.xml'), 'lxml')
for item in soup.find_all(['name', 'age', 'number']):
    print(f'{item.name.capitalize()}:{item.text}')
Output:
Name:Tim Age:23 Number:1234 Name:Jenny Age:23 Number:4321 Name:Wuff Age:4 Number:2323
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Open and read a tab delimited file from html using python cgi luffy 2 2,633 Aug-24-2020, 06:25 AM
Last Post: luffy
  Read owl file using python flask Gayathri 1 2,396 Nov-20-2019, 12:56 PM
Last Post: ChislaineWijdeven
  how to read data from xml file Raj 7 5,181 Apr-14-2018, 12:14 PM
Last Post: Raj
  Read input file and print hyperlinks Emmanouil 8 15,054 Oct-23-2016, 07:26 PM
Last Post: snippsat

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020