Oct-13-2024, 06:19 AM
Say you already have tags:
Probably, it uses a more complicated regex to cater for all possibilities.
type(tags)
Output:<class 'bs4.element.ResultSet'>
Take say the first element:s = str(tags[0]) print(s)Now you have:
Output:<a class="nav-logo" href="https://www.python.org/">
<img alt="Python logo" src="_static/py.svg"/>
</a>
Now you can get the actual link address using a regex expression:import re e = re.compile(r'(href=")([:/a-z\.]+)') res = e.search(s) print(res.group(2)) # 'https://www.python.org/'
Output:'https://www.python.org/'
That's more or less what Beautifulsoup is doing! Probably, it uses a more complicated regex to cater for all possibilities.