Python Forum

Full Version: BS4 - How Can I Scrape These Links?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
[Image: QJCy04MTQmGTa4oJnLQRSA.png]

I need to scrape all the URLS on the page but as you can see the href has no class or ID to hook into.

Is it possible to scrape all the h3 elements then filter the inner hrefs?
Try to post the html code,then is easier to test it out.
from bs4 import BeautifulSoup

# Simulate a web page
html = '''\
<body>
  <h3 class="r">
    <a href='https://python-forum.io' ping='some url'>Learn Python</a>
  </div>
</body>'''

soup = BeautifulSoup(html, 'lxml')
Use:
>>> r = soup.find(class_="r")
>>> r
<h3 class="r">
<a href="https://python-forum.io" ping="some url">Learn Python</a>
</h3>

>>> a = r.find('a')
>>> a
<a href="https://python-forum.io" ping="some url">Learn Python</a>

>>> a.get('href')
'https://python-forum.io'
>>> a.get('ping')
'some url'
>>> a.text
'Learn Python'