Nov-10-2017, 06:54 PM
Just getting name of tags work fine in both lxml and BeautifulSoup.
Keeping the structure in output can be a challenge,
as both
Example getting tag names:
Keeping the structure in output can be a challenge,
as both
pretty print()lxml
and prettify()BS
i do not think work for text output.Example getting tag names:
from lxml import etree from bs4 import BeautifulSoup xml = '''\ <data> <timestamp>...</timestamp> <people> <person> <name>...</name> <age>...</age> </person> <person> <name>...</name> <age>...</age> <degree /> </person> <person> <name>...</name> <age>...</age> <degree /> <siblings> <brother>...</brother> <brother>...</brother> <sister>...</sister> </siblings> </person> </people> <cities> <city> <name>...</name> <country>...</country> <continent>...</continent> <capital /> </city> <city> <name>...</name> <country>...</country> <continent>...</continent> </city> </cities> </data> ''' root = etree.fromstring(xml) soup = BeautifulSoup(xml, 'lxml') # lxml for node in root.iter('*'): print(node.tag) # BS for tag in soup.findChildren(): print(tag.name)
Output:data
timestamp
people
person
name
age
person
name
age
degree
person
name
age
degree
siblings
brother
brother
sister
cities
city
name
country
continent
capital
city
name
country
continent