Python Forum

Full Version: how to parse with BeautifulSoup
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
This format is confusing me.

What is the best way to parse into individual components with BeautifulSoup.

html:
Output:
<td class="small"> <b> [Amend] </b> <b> [Cover] </b> Material Amendment to Form ATS-N (Rule 304(a)(2)(i)(A)) <br/> Acc-no: 0001609177-23-000017 (34 Act) Size: 3 KB </td>
desired resilts:
Output:
[Amend] [Cover] Material Amendment to Form ATS-N (Rule 304(a)(2)(i)(A)) Acc-no: 0001609177-23-000017 (34 Act) Size: 3 KB
I have tried:
td.get_text(strip=True).split('\n')
which results in a list of length 1:
Output:
['[Amend][Cover]Material Amendment to Form ATS-N (Rule 304(a)(2)(i)(A))Acc-no: 0001609177-23-000017\xa0(34 Act)\xa0 Size: 3 KB']
Also tried numerous other methods with improper results.
I don't know if this is what your looking for but, works
from bs4 import BeautifulSoup as bs
import io

some_text = io.StringIO('''
<td class="small">
    <b>
      [Amend]
    </b>
    <b>
      [Cover]
    </b>
     Material Amendment to Form ATS-N (Rule 304(a)(2)(i)(A))
    <br/>
     Acc-no: 0001609177-23-000017 (34 Act)  Size: 3 KB
  </td>
''')


soup = bs(some_text, 'html.parser')

data = soup.find('td', attrs = {'class': 'small'}).text.split('\n')
data = [data.strip() for data in data if data.strip() != '']

for details in data:
    print(details)
Output:
[Amend] [Cover] Material Amendment to Form ATS-N (Rule 304(a)(2)(i)(A)) Acc-no: 0001609177-23-000017 (34 Act) Size: 3 KB
Quote:I don't know if this is what your looking for but, works
Yes, it is. Just couldn't wrap my mind around it this morning!
Thank you
(Oct-18-2023, 02:03 PM)Larz60+ Wrote: [ -> ]
Quote:I don't know if this is what your looking for but, works
Yes, it is. Just couldn't wrap my mind around it this morning!
Thank you

You're welcome