Python Forum

Full Version: Help with beautiful soup
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I have a large XML document and am able to process the data ok.
I can make it more efficient if I do my find_all statements on only tags
that are direct children, and not grand children of the node of interest.
the following statement gets me all tags of the parent:
tags = [tag.name for tag in one_entry.find_all()]
but includes grandchildren tags.
How can I adjust it so that I only find immediate children tags?
not sure, but I think you should use .contents or .children
https://www.crummy.com/software/Beautifu...d-children
i.e.
tags = [tag.name for tag in one_entry.contents]
buran, Actually they both work,
tags = [tag.name for tag in entry.children]
print('tags: {}'.format(tags))
results:
Output:
tags: [None, 'doc-id', None, 'title', None, 'author', None, 'date', None, 'format', None, 'current-status', None, 'publication-status', None, 'stream', None, 'doi', None]
Now I have to figure out how to filter the Nones from my list comprehension, but there are no grandchildren which is what I was looking for
Ok, what am I doing wrong here?

tags = [tag.name for tag in entry.children if tag is not None]
print('tags: {}'.format(tags))
result:
Output:
tags: [None, 'doc-id', None, 'title', None, 'author', None, 'date', None, 'format', None, 'current-status', None, 'publication-status', None, 'stream', None, 'doi', None]
I think it should be
tags = [tag.name for tag in entry.children if tag.name]

(Jul-18-2017, 05:26 PM)Larz60+ Wrote: [ -> ]buran, Actually they both work,
I suspect so, one returns list, the other - generator
Thank you