Soup('A')

Pedroski55 · Oct-13-2024, 06:19 AM

Say you already have tags:

type(tags)

Output:
<class 'bs4.element.ResultSet'>

Take say the first element:

s = str(tags[0])
print(s)

Now you have:

Output:<a class="nav-logo" href="https://www.python.org/">
<img alt="Python logo" src="_static/py.svg"/>
</a>

Now you can get the actual link address using a regex expression:

import re

e = re.compile(r'(href=")([:/a-z\.]+)')
res = e.search(s)
print(res.group(2)) # 'https://www.python.org/'

Output:
'https://www.python.org/'

That's more or less what Beautifulsoup is doing!

Probably, it uses a more complicated regex to cater for all possibilities.