Help with beautiful soup - Printable Version

Help with beautiful soup - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html)
+--- Thread: Help with beautiful soup (/thread-4035.html)

Help with beautiful soup - Larz60+ - Jul-18-2017

I have a large XML document and am able to process the data ok.
I can make it more efficient if I do my find_all statements on only tags
that are direct children, and not grand children of the node of interest.
the following statement gets me all tags of the parent:

tags = [tag.name for tag in one_entry.find_all()]

but includes grandchildren tags.
How can I adjust it so that I only find immediate children tags?

RE: Help with beautiful soup - buran - Jul-18-2017

not sure, but I think you should use .contents or .children
https://www.crummy.com/software/BeautifulSoup/bs4/doc/#contents-and-children
i.e.

tags = [tag.name for tag in one_entry.contents]

RE: Help with beautiful soup - Larz60+ - Jul-18-2017

buran, Actually they both work,

tags = [tag.name for tag in entry.children]
print('tags: {}'.format(tags))

results:

Output:
tags: [None, 'doc-id', None, 'title', None, 'author', None, 'date', None, 'format', None, 'current-status', None, 'publication-status', None, 'stream', None, 'doi', None]

Now I have to figure out how to filter the Nones from my list comprehension, but there are no grandchildren which is what I was looking for

RE: Help with beautiful soup - Larz60+ - Jul-18-2017

Ok, what am I doing wrong here?

tags = [tag.name for tag in entry.children if tag is not None]
print('tags: {}'.format(tags))

result:

Output:
tags: [None, 'doc-id', None, 'title', None, 'author', None, 'date', None, 'format', None, 'current-status', None, 'publication-status', None, 'stream', None, 'doi', None]

RE: Help with beautiful soup - buran - Jul-18-2017

I think it should be

tags = [tag.name for tag in entry.children if tag.name]

(Jul-18-2017, 05:26 PM)Larz60+ Wrote: buran, Actually they both work,

I suspect so, one returns list, the other - generator

RE: Help with beautiful soup - Larz60+ - Jul-18-2017

Thank you