Python Forum
how to parse with BeautifulSoup - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: how to parse with BeautifulSoup (/thread-40946.html)



how to parse with BeautifulSoup - Larz60+ - Oct-18-2023

This format is confusing me.

What is the best way to parse into individual components with BeautifulSoup.

html:
Output:
<td class="small"> <b> [Amend] </b> <b> [Cover] </b> Material Amendment to Form ATS-N (Rule 304(a)(2)(i)(A)) <br/> Acc-no: 0001609177-23-000017 (34 Act) Size: 3 KB </td>
desired resilts:
Output:
[Amend] [Cover] Material Amendment to Form ATS-N (Rule 304(a)(2)(i)(A)) Acc-no: 0001609177-23-000017 (34 Act) Size: 3 KB
I have tried:
td.get_text(strip=True).split('\n')
which results in a list of length 1:
Output:
['[Amend][Cover]Material Amendment to Form ATS-N (Rule 304(a)(2)(i)(A))Acc-no: 0001609177-23-000017\xa0(34 Act)\xa0 Size: 3 KB']
Also tried numerous other methods with improper results.


RE: how to parse with BeautifulSoup - menator01 - Oct-18-2023

I don't know if this is what your looking for but, works
from bs4 import BeautifulSoup as bs
import io

some_text = io.StringIO('''
<td class="small">
    <b>
      [Amend]
    </b>
    <b>
      [Cover]
    </b>
     Material Amendment to Form ATS-N (Rule 304(a)(2)(i)(A))
    <br/>
     Acc-no: 0001609177-23-000017 (34 Act)  Size: 3 KB
  </td>
''')


soup = bs(some_text, 'html.parser')

data = soup.find('td', attrs = {'class': 'small'}).text.split('\n')
data = [data.strip() for data in data if data.strip() != '']

for details in data:
    print(details)
Output:
[Amend] [Cover] Material Amendment to Form ATS-N (Rule 304(a)(2)(i)(A)) Acc-no: 0001609177-23-000017 (34 Act) Size: 3 KB



RE: how to parse with BeautifulSoup - Larz60+ - Oct-18-2023

Quote:I don't know if this is what your looking for but, works
Yes, it is. Just couldn't wrap my mind around it this morning!
Thank you


RE: how to parse with BeautifulSoup - menator01 - Oct-18-2023

(Oct-18-2023, 02:03 PM)Larz60+ Wrote:
Quote:I don't know if this is what your looking for but, works
Yes, it is. Just couldn't wrap my mind around it this morning!
Thank you

You're welcome