Help with beautiful soup - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: Help with beautiful soup (/thread-4035.html) |
Help with beautiful soup - Larz60+ - Jul-18-2017 I have a large XML document and am able to process the data ok. I can make it more efficient if I do my find_all statements on only tags that are direct children, and not grand children of the node of interest. the following statement gets me all tags of the parent: tags = [tag.name for tag in one_entry.find_all()]but includes grandchildren tags. How can I adjust it so that I only find immediate children tags? RE: Help with beautiful soup - buran - Jul-18-2017 not sure, but I think you should use .contents or .children https://www.crummy.com/software/BeautifulSoup/bs4/doc/#contents-and-children i.e. tags = [tag.name for tag in one_entry.contents] RE: Help with beautiful soup - Larz60+ - Jul-18-2017 buran, Actually they both work, tags = [tag.name for tag in entry.children] print('tags: {}'.format(tags))results: Now I have to figure out how to filter the Nones from my list comprehension, but there are no grandchildren which is what I was looking for
RE: Help with beautiful soup - Larz60+ - Jul-18-2017 Ok, what am I doing wrong here? tags = [tag.name for tag in entry.children if tag is not None] print('tags: {}'.format(tags))result:
RE: Help with beautiful soup - buran - Jul-18-2017 I think it should be tags = [tag.name for tag in entry.children if tag.name] (Jul-18-2017, 05:26 PM)Larz60+ Wrote: buran, Actually they both work,I suspect so, one returns list, the other - generator RE: Help with beautiful soup - Larz60+ - Jul-18-2017 Thank you |