[SOLVED] [BeautifulSoup] Why are some elements ignored?

[SOLVED] [BeautifulSoup] Why are some elements ignored?

Thread Rating:

0 Vote(s) - 0 Average
1
2
3
4
5

Thread Modes

[SOLVED] [BeautifulSoup] Why are some elements ignored?

Winfried
Spam, spam, eggs, and spam

Posts: 218

Threads: 97

Joined: Aug 2018

Reputation: 0

#1

Sep-02-2024, 05:38 PM (This post was last modified: Sep-04-2024, 09:35 AM by Winfried.)

Hello,

I need BS to work on a book formated as XHTML.

Each page is a <div>.

Within each page, I need to grab the footnotes, that can contain either just plain text, or one of more <i> sub-elements.

The following code does grab the plain footnotes, but ignores those that contain italics. Why is that?

Thank you.

with open("input.xhtml", mode='rb') as file:
  fileContent = file.read()
soup = BS(fileContent, features="xml")

"""
<div id="page14"><p>18 <i>Some chapter</i></p>
text body
<p>1. footnote</p>
<p>2. footnote <i>blah</i>, blah</p>
</div>
"""
#TODO extract page number only
divs = soup.find_all('div', id=re.compile(r"^page\d+$"))
for div in divs:
	#Why ignored if contains sub-elements, eg. "<p>4. Some note <i>some sub-element</i> Blah, 2003.</p>" ?
	ps = div.find_all("p", string=re.compile(r"^\d+\. "))
	for p in ps:
		print(p.string)

Find

Reply

Messages In This Thread

[SOLVED] [BeautifulSoup] Why are some elements ignored? - by Winfried - Sep-02-2024, 05:38 PM

RE: [BeautifulSoup] Why are some elements ignored? - by snippsat - Sep-02-2024, 07:10 PM

RE: [BeautifulSoup] Why are some elements ignored? - by Winfried - Sep-03-2024, 11:29 AM

RE: [BeautifulSoup] Why are some elements ignored? - by Winfried - Sep-03-2024, 11:49 AM

RE: [BeautifulSoup] Why are some elements ignored? - by snippsat - Sep-03-2024, 02:15 PM

RE: [BeautifulSoup] Why are some elements ignored? - by Winfried - Sep-04-2024, 09:34 AM

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	[SOLVED] [BeautifulSoup] Why attribute not found?	Winfried	0	1,298	Mar-11-2023, 10:00 PM Last Post: Winfried
	[SOLVED] [BeautifulSoup] Why does it turn inserted string's brackets into </>?	Winfried	0	2,736	Sep-03-2022, 11:21 PM Last Post: Winfried
	[SOLVED] [Beautifulsoup] Find if element exists, and edit/append?	Winfried	2	7,117	Sep-03-2022, 10:14 PM Last Post: Winfried
	[SOLVED] [BeautifulSoup] Turn select() into comma-separated string?	Winfried	0	1,938	Aug-19-2022, 08:07 PM Last Post: Winfried
	[SOLVED] [BeautifulSoup] How to get this text?	Winfried	6	3,214	Aug-17-2022, 03:58 PM Last Post: Winfried
	ValueError: Length mismatch: Expected axis has 8 elements, new values have 1 elements	ilknurg	1	8,295	May-17-2022, 11:38 AM Last Post: Larz60+
	Sorting Elements via parameters pointing to those elements.	rpalmer	3	3,485	Feb-10-2021, 04:53 PM Last Post: rpalmer

Users browsing this thread: 1 Guest(s)

View a Printable Version

User Panel Messages

Log Out

Pay your profile a visit

User Control Panel

Do some changes on your profile

View private messages unread

Change signature

Announcements

Announcement #1 8/1/2020

Announcement #2 8/2/2020

Announcement #3 8/6/2020