Python Forum

This format is confusing me.

What is the best way to parse into individual components with BeautifulSoup.

html:

Output:<td class="small">
  <b>
    [Amend]
  </b>
  <b>
    [Cover]
  </b>
   Material Amendment to Form ATS-N (Rule 304(a)(2)(i)(A))
  <br/>
   Acc-no: 0001609177-23-000017 (34 Act)  Size: 3 KB
</td>

desired resilts:

Output:[Amend]
[Cover]
Material Amendment to Form ATS-N (Rule 304(a)(2)(i)(A))
Acc-no: 0001609177-23-000017 (34 Act)  Size: 3 KB

I have tried:

td.get_text(strip=True).split('\n')

which results in a list of length 1:

Output:
['[Amend][Cover]Material Amendment to Form ATS-N (Rule 304(a)(2)(i)(A))Acc-no: 0001609177-23-000017\xa0(34 Act)\xa0 Size: 3 KB']

Also tried numerous other methods with improper results.

I don't know if this is what your looking for but, works

from bs4 import BeautifulSoup as bs
import io

some_text = io.StringIO('''
<td class="small">
    <b>
      [Amend]
    </b>
    <b>
      [Cover]
    </b>
     Material Amendment to Form ATS-N (Rule 304(a)(2)(i)(A))
    <br/>
     Acc-no: 0001609177-23-000017 (34 Act)  Size: 3 KB
  </td>
''')


soup = bs(some_text, 'html.parser')

data = soup.find('td', attrs = {'class': 'small'}).text.split('\n')
data = [data.strip() for data in data if data.strip() != '']

for details in data:
    print(details)

Output:[Amend]
[Cover]
Material Amendment to Form ATS-N (Rule 304(a)(2)(i)(A))
Acc-no: 0001609177-23-000017 (34 Act)  Size: 3 KB

Quote:I don't know if this is what your looking for but, works

Yes, it is. Just couldn't wrap my mind around it this morning!
Thank you

(Oct-18-2023, 02:03 PM)Larz60+ Wrote: [ -> ]
Quote:I don't know if this is what your looking for but, works
Yes, it is. Just couldn't wrap my mind around it this morning!
Thank you

You're welcome

Larz60+

menator01

Larz60+

menator01