Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Help with beautiful soup
#1
I have a large XML document and am able to process the data ok.
I can make it more efficient if I do my find_all statements on only tags
that are direct children, and not grand children of the node of interest.
the following statement gets me all tags of the parent:
tags = [tag.name for tag in one_entry.find_all()]
but includes grandchildren tags.
How can I adjust it so that I only find immediate children tags?
Reply
#2
not sure, but I think you should use .contents or .children
https://www.crummy.com/software/Beautifu...d-children
i.e.
tags = [tag.name for tag in one_entry.contents]
Reply
#3
buran, Actually they both work,
tags = [tag.name for tag in entry.children]
print('tags: {}'.format(tags))
results:
Output:
tags: [None, 'doc-id', None, 'title', None, 'author', None, 'date', None, 'format', None, 'current-status', None, 'publication-status', None, 'stream', None, 'doi', None]
Now I have to figure out how to filter the Nones from my list comprehension, but there are no grandchildren which is what I was looking for
Reply
#4
Ok, what am I doing wrong here?

tags = [tag.name for tag in entry.children if tag is not None]
print('tags: {}'.format(tags))
result:
Output:
tags: [None, 'doc-id', None, 'title', None, 'author', None, 'date', None, 'format', None, 'current-status', None, 'publication-status', None, 'stream', None, 'doi', None]
Reply
#5
I think it should be
tags = [tag.name for tag in entry.children if tag.name]

(Jul-18-2017, 05:26 PM)Larz60+ Wrote: buran, Actually they both work,
I suspect so, one returns list, the other - generator
Reply
#6
Thank you
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Beautiful Soup - access a rating value in a class KatMac 1 3,447 Apr-16-2021, 01:27 PM
Last Post: snippsat
  *Beginner* web scraping/Beautiful Soup help 7ken8 2 2,594 Jan-28-2021, 04:26 PM
Last Post: 7ken8
  Help: Beautiful Soup - Parsing HTML table ironfelix717 2 2,661 Oct-01-2020, 02:19 PM
Last Post: snippsat
  Beautiful Soup (suddenly) doesn't get full webpage html j.crater 8 16,735 Jul-11-2020, 04:31 PM
Last Post: j.crater
  Requests-HTML vs Beautiful Soup - How to Choose? robin73 0 3,809 Jun-23-2020, 02:53 PM
Last Post: robin73
  looking for direction - scrappy, crawler, beautiful soup Sly_Corn 2 2,438 Mar-17-2020, 03:17 PM
Last Post: Sly_Corn
  Beautiful soup truncates results jonesjoz 4 3,854 Mar-09-2020, 06:04 PM
Last Post: jonesjoz
  Beautiful soup and tags starter_student 11 6,152 Jul-08-2019, 03:41 PM
Last Post: starter_student
  Beautiful Soup find_all() kirito85 2 3,349 Jun-14-2019, 02:17 AM
Last Post: kirito85
  [split] Using beautiful soup to get html attribute value moski 6 6,272 Jun-03-2019, 04:24 PM
Last Post: moski

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020