Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
BeautifulSoup
#1
Hello, guys!

Need a little help here...
I've tried many ways, but I couldn't hack it..

I need to extract a list of the titles in the html below
The issue here is that I don't know how to reconize those titles as a list <a>

This way I could only get to the first title:
  tag = soup.a['title']
And this way reached all the strings, but couldn't extract the titles in each line:
tabela.find_all('a')
My list must look like this:
BV DOLAR CAMBIAL FIC DE FI
DOLAR FI CAMBIAL
OCCAM FUNDO DE INVESTIMENTO CAMBIAL
BV USD SHORT CAMBIAL FIC
RIO BRAVO CREDITO PRIVADO FIRF
HTML String:
Quote:[<a class="bt-detail" href="/fundos-de-investimento/bv-dolar-cambial-fic-de-fi" title="BV DOLAR CAMBIAL FIC DE FI">Detalhes</a>,
<a class="bt-detail" href="/fundos-de-investimento/dolar-fi-cambial" title="DOLAR FI CAMBIAL">Detalhes</a>,
<a class="bt-detail" href="/fundos-de-investimento/occam-fundo-de-investimento-cambial" title="OCCAM FUNDO DE INVESTIMENTO CAMBIAL">Detalhes</a>,
<a class="bt-detail" href="/fundos-de-investimento/bv-usd-short-cambial-fic" title="BV USD SHORT CAMBIAL FIC">Detalhes</a>,
<a class="bt-detail" href="/fundos-de-investimento/rio-bravo-credito-privado-firf" title="RIO BRAVO CREDITO PRIVADO FIRF">Detalhes</a>]

Thaaanks!!
Reply
#2
Please show full code.
andre_kadomoto likes this post
Reply
#3
(May-23-2021, 04:57 AM)andre_kadomoto Wrote: <a class="bt-detail" href="/fundos-de-investimento/dolar-fi-cambial" title="DOLAR FI CAMBIAL">Detalhes</a>
"title" is an attribute of <a>. This is what the official documentation says about accessing attributes:
Quote:Attributes

A tag may have any number of attributes. The tag <b id="boldest"> has an attribute “id” whose value is “boldest”. You can access a tag’s attributes by treating the tag like a dictionary:

tag = BeautifulSoup('<b id="boldest">bold</b>', 'html.parser').b
tag['id']
# 'boldest'
You can access that dictionary directly as .attrs:

tag.attrs
# {'id': 'boldest'}
andre_kadomoto likes this post
Reply
#4
As mention over use attrs.
from bs4 import BeautifulSoup

html = '''\
<a class="bt-detail" href="/fundos-de-investimento/bv-dolar-cambial-fic-de-fi" title="BV DOLAR CAMBIAL FIC DE FI">Detalhes</a>
<a class="bt-detail" href="/fundos-de-investimento/dolar-fi-cambial" title="DOLAR FI CAMBIAL">Detalhes</a>'''

soup = BeautifulSoup(html, 'lxml')
for tag in soup.find_all('a'):
    print(tag.attrs.get('title', 'Not found'))
Output:
BV DOLAR CAMBIAL FIC DE FI DOLAR FI CAMBIAL
andre_kadomoto likes this post
Reply
#5
thaaaanks, guys!!!
Dance
Reply
#6
Tnx for the info guys!
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020