Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Soup('A')
#6
Say you already have tags:

type(tags)
Output:
<class 'bs4.element.ResultSet'>
Take say the first element:

s = str(tags[0])
print(s)
Now you have:

Output:
<a class="nav-logo" href="https://www.python.org/"> <img alt="Python logo" src="_static/py.svg"/> </a>
Now you can get the actual link address using a regex expression:

import re

e = re.compile(r'(href=")([:/a-z\.]+)')
res = e.search(s)
print(res.group(2)) # 'https://www.python.org/'
Output:
'https://www.python.org/'
That's more or less what Beautifulsoup is doing!

Probably, it uses a more complicated regex to cater for all possibilities.
Reply


Messages In This Thread
Soup('A') - by new_coder_231013 - Aug-31-2022, 12:09 PM
RE: Soup('A') - by Larz60+ - Aug-31-2022, 12:23 PM
RE: Soup('A') - by new_coder_231013 - Sep-12-2022, 12:02 PM
RE: Soup('A') - by Gaurav_Kumar - Aug-09-2023, 11:49 AM
RE: Soup('A') - by jenson - Sep-24-2024, 11:09 AM
RE: Soup('A') - by Pedroski55 - Oct-13-2024, 06:19 AM

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020