Python Forum
[SOLVED] [BeautifulSoup] Why are some elements ignored?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
[SOLVED] [BeautifulSoup] Why are some elements ignored?
#1
Hello,

I need BS to work on a book formated as XHTML.

Each page is a <div>.

Within each page, I need to grab the footnotes, that can contain either just plain text, or one of more <i> sub-elements.

The following code does grab the plain footnotes, but ignores those that contain italics. Why is that?

Thank you.

with open("input.xhtml", mode='rb') as file:
  fileContent = file.read()
soup = BS(fileContent, features="xml")

"""
<div id="page14"><p>18 <i>Some chapter</i></p>
text body
<p>1. footnote</p>
<p>2. footnote <i>blah</i>, blah</p>
</div>
"""
#TODO extract page number only
divs = soup.find_all('div', id=re.compile(r"^page\d+$"))
for div in divs:
	#Why ignored if contains sub-elements, eg. "<p>4. Some note <i>some sub-element</i> Blah, 2003.</p>" ?
	ps = div.find_all("p", string=re.compile(r"^\d+\. "))
	for p in ps:
		print(p.string)
Reply


Messages In This Thread
[SOLVED] [BeautifulSoup] Why are some elements ignored? - by Winfried - Sep-02-2024, 05:38 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  [SOLVED] [BeautifulSoup] Why attribute not found? Winfried 0 1,298 Mar-11-2023, 10:00 PM
Last Post: Winfried
  [SOLVED] [BeautifulSoup] Why does it turn inserted string's brackets into &lt;/&gt;? Winfried 0 2,736 Sep-03-2022, 11:21 PM
Last Post: Winfried
  [SOLVED] [Beautifulsoup] Find if element exists, and edit/append? Winfried 2 7,117 Sep-03-2022, 10:14 PM
Last Post: Winfried
  [SOLVED] [BeautifulSoup] Turn select() into comma-separated string? Winfried 0 1,938 Aug-19-2022, 08:07 PM
Last Post: Winfried
  [SOLVED] [BeautifulSoup] How to get this text? Winfried 6 3,214 Aug-17-2022, 03:58 PM
Last Post: Winfried
  ValueError: Length mismatch: Expected axis has 8 elements, new values have 1 elements ilknurg 1 8,295 May-17-2022, 11:38 AM
Last Post: Larz60+
  Sorting Elements via parameters pointing to those elements. rpalmer 3 3,485 Feb-10-2021, 04:53 PM
Last Post: rpalmer

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020