I need to scrape all the URLS on the page but as you can see the href has no class or ID to hook into.
Is it possible to scrape all the h3 elements then filter the inner hrefs?
Try to post the html code,then is easier to test it out.
from bs4 import BeautifulSoup
# Simulate a web page
html = '''\
<body>
<h3 class="r">
<a href='https://python-forum.io' ping='some url'>Learn Python</a>
</div>
</body>'''
soup = BeautifulSoup(html, 'lxml')
Use:
>>> r = soup.find(class_="r")
>>> r
<h3 class="r">
<a href="https://python-forum.io" ping="some url">Learn Python</a>
</h3>
>>> a = r.find('a')
>>> a
<a href="https://python-forum.io" ping="some url">Learn Python</a>
>>> a.get('href')
'https://python-forum.io'
>>> a.get('ping')
'some url'
>>> a.text
'Learn Python'