Feb-23-2019, 03:17 PM
(This post was last modified: Feb-23-2019, 03:17 PM by erdem_ustunmu.)
Thank you so much for answering.
I reviewed the Beautiful Soup package you mentioned.
I've also tried xml, which has a simple level of html or a few elements in xml.
But I've never been successful in nested tags.
how to get the tags and elements between the <dataDscr> ... </ dataDscr> tags,attributes and transfer them to csv.
How do I make a find and loop for <dataDscr><var> ... </ var></ dataDscr> tags?
There are too many xml files, So have to I ,to define all the tags and attributes that depend on them one by one?
for example:Located at the end of the xml file
<var ID="V541" name="ilo_neet" files="F6" dcml="0" intrvl="discrete">
There are multiple categories for this variable
<catgry>
<catValu>
1
</catValu>
<labl>
Youth not in education, employment or training
</labl>
<catStat type="freq">
4967
</catStat>
</catgry>
<catgry missing="Y">
<catValu>
Sysmiss
</catValu>
<catStat type="freq">
71241
</catStat>
</catgry>
I reviewed the Beautiful Soup package you mentioned.
I've also tried xml, which has a simple level of html or a few elements in xml.
But I've never been successful in nested tags.
how to get the tags and elements between the <dataDscr> ... </ dataDscr> tags,attributes and transfer them to csv.
How do I make a find and loop for <dataDscr><var> ... </ var></ dataDscr> tags?
There are too many xml files, So have to I ,to define all the tags and attributes that depend on them one by one?
for example:Located at the end of the xml file
<var ID="V541" name="ilo_neet" files="F6" dcml="0" intrvl="discrete">
There are multiple categories for this variable
<catgry>
<catValu>
1
</catValu>
<labl>
Youth not in education, employment or training
</labl>
<catStat type="freq">
4967
</catStat>
</catgry>
<catgry missing="Y">
<catValu>
Sysmiss
</catValu>
<catStat type="freq">
71241
</catStat>
</catgry>
(Feb-23-2019, 12:22 PM)snippsat Wrote: You use a parser Python has two big ones that most use BeautifulSoup and lxml.
To give start example and parse a couple of values.
from bs4 import BeautifulSoup soup = BeautifulSoup(open('NPL_2008_LFS_v01_M_v01_A_ILOVAR.xml', encoding='utf-8'), 'xml') title = soup.find('titl') producer = soup.find('producer') print(title.text.strip()) print(producer.attrs.get('affiliation'))
Output:NPL_2008_LFS_v01_M International Labour Organization