Python Forum
how to parse with BeautifulSoup
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
how to parse with BeautifulSoup
#1
This format is confusing me.

What is the best way to parse into individual components with BeautifulSoup.

html:
Output:
<td class="small"> <b> [Amend] </b> <b> [Cover] </b> Material Amendment to Form ATS-N (Rule 304(a)(2)(i)(A)) <br/> Acc-no: 0001609177-23-000017 (34 Act) Size: 3 KB </td>
desired resilts:
Output:
[Amend] [Cover] Material Amendment to Form ATS-N (Rule 304(a)(2)(i)(A)) Acc-no: 0001609177-23-000017 (34 Act) Size: 3 KB
I have tried:
td.get_text(strip=True).split('\n')
which results in a list of length 1:
Output:
['[Amend][Cover]Material Amendment to Form ATS-N (Rule 304(a)(2)(i)(A))Acc-no: 0001609177-23-000017\xa0(34 Act)\xa0 Size: 3 KB']
Also tried numerous other methods with improper results.
Reply
#2
I don't know if this is what your looking for but, works
from bs4 import BeautifulSoup as bs
import io

some_text = io.StringIO('''
<td class="small">
    <b>
      [Amend]
    </b>
    <b>
      [Cover]
    </b>
     Material Amendment to Form ATS-N (Rule 304(a)(2)(i)(A))
    <br/>
     Acc-no: 0001609177-23-000017 (34 Act)  Size: 3 KB
  </td>
''')


soup = bs(some_text, 'html.parser')

data = soup.find('td', attrs = {'class': 'small'}).text.split('\n')
data = [data.strip() for data in data if data.strip() != '']

for details in data:
    print(details)
Output:
[Amend] [Cover] Material Amendment to Form ATS-N (Rule 304(a)(2)(i)(A)) Acc-no: 0001609177-23-000017 (34 Act) Size: 3 KB
I welcome all feedback.
The only dumb question, is one that doesn't get asked.
My Github
How to post code using bbtags


Reply
#3
Quote:I don't know if this is what your looking for but, works
Yes, it is. Just couldn't wrap my mind around it this morning!
Thank you
Reply
#4
(Oct-18-2023, 02:03 PM)Larz60+ Wrote:
Quote:I don't know if this is what your looking for but, works
Yes, it is. Just couldn't wrap my mind around it this morning!
Thank you

You're welcome
I welcome all feedback.
The only dumb question, is one that doesn't get asked.
My Github
How to post code using bbtags


Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020