Python Forum
BS4 - How Can I Scrape These Links?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
BS4 - How Can I Scrape These Links?
#1
[Image: QJCy04MTQmGTa4oJnLQRSA.png]

I need to scrape all the URLS on the page but as you can see the href has no class or ID to hook into.

Is it possible to scrape all the h3 elements then filter the inner hrefs?
Reply
#2
Try to post the html code,then is easier to test it out.
from bs4 import BeautifulSoup

# Simulate a web page
html = '''\
<body>
  <h3 class="r">
    <a href='https://python-forum.io' ping='some url'>Learn Python</a>
  </div>
</body>'''

soup = BeautifulSoup(html, 'lxml')
Use:
>>> r = soup.find(class_="r")
>>> r
<h3 class="r">
<a href="https://python-forum.io" ping="some url">Learn Python</a>
</h3>

>>> a = r.find('a')
>>> a
<a href="https://python-forum.io" ping="some url">Learn Python</a>

>>> a.get('href')
'https://python-forum.io'
>>> a.get('ping')
'some url'
>>> a.text
'Learn Python'
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  scrape data 1 go to next page scrape data 2 and so on alkaline3 6 5,170 Mar-13-2020, 07:59 PM
Last Post: alkaline3
  webscrapping links and then enter those links to scrape data kirito85 2 3,195 Jun-13-2019, 02:23 AM
Last Post: kirito85
  Need To Scrape Some Links digitalmatic7 2 2,645 Oct-09-2018, 02:33 AM
Last Post: digitalmatic7

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020