Python Forum

Full Version: BeautifulSoup
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello, guys!

Need a little help here...
I've tried many ways, but I couldn't hack it..

I need to extract a list of the titles in the html below
The issue here is that I don't know how to reconize those titles as a list <a>

This way I could only get to the first title:
  tag = soup.a['title']
And this way reached all the strings, but couldn't extract the titles in each line:
tabela.find_all('a')
My list must look like this:
BV DOLAR CAMBIAL FIC DE FI
DOLAR FI CAMBIAL
OCCAM FUNDO DE INVESTIMENTO CAMBIAL
BV USD SHORT CAMBIAL FIC
RIO BRAVO CREDITO PRIVADO FIRF
HTML String:
Quote:[<a class="bt-detail" href="/fundos-de-investimento/bv-dolar-cambial-fic-de-fi" title="BV DOLAR CAMBIAL FIC DE FI">Detalhes</a>,
<a class="bt-detail" href="/fundos-de-investimento/dolar-fi-cambial" title="DOLAR FI CAMBIAL">Detalhes</a>,
<a class="bt-detail" href="/fundos-de-investimento/occam-fundo-de-investimento-cambial" title="OCCAM FUNDO DE INVESTIMENTO CAMBIAL">Detalhes</a>,
<a class="bt-detail" href="/fundos-de-investimento/bv-usd-short-cambial-fic" title="BV USD SHORT CAMBIAL FIC">Detalhes</a>,
<a class="bt-detail" href="/fundos-de-investimento/rio-bravo-credito-privado-firf" title="RIO BRAVO CREDITO PRIVADO FIRF">Detalhes</a>]

Thaaanks!!
Please show full code.
(May-23-2021, 04:57 AM)andre_kadomoto Wrote: [ -> ]<a class="bt-detail" href="/fundos-de-investimento/dolar-fi-cambial" title="DOLAR FI CAMBIAL">Detalhes</a>
"title" is an attribute of <a>. This is what the official documentation says about accessing attributes:
Quote:Attributes

A tag may have any number of attributes. The tag <b id="boldest"> has an attribute “id” whose value is “boldest”. You can access a tag’s attributes by treating the tag like a dictionary:

tag = BeautifulSoup('<b id="boldest">bold</b>', 'html.parser').b
tag['id']
# 'boldest'
You can access that dictionary directly as .attrs:

tag.attrs
# {'id': 'boldest'}
As mention over use attrs.
from bs4 import BeautifulSoup

html = '''\
<a class="bt-detail" href="/fundos-de-investimento/bv-dolar-cambial-fic-de-fi" title="BV DOLAR CAMBIAL FIC DE FI">Detalhes</a>
<a class="bt-detail" href="/fundos-de-investimento/dolar-fi-cambial" title="DOLAR FI CAMBIAL">Detalhes</a>'''

soup = BeautifulSoup(html, 'lxml')
for tag in soup.find_all('a'):
    print(tag.attrs.get('title', 'Not found'))
Output:
BV DOLAR CAMBIAL FIC DE FI DOLAR FI CAMBIAL
thaaaanks, guys!!!
Dance
Tnx for the info guys!