Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Scraping a Website (HELP)
#1
Hi, I need to scrape out all the links on a website.

I want to crawl something like Screaming Frog but with Python.

This is my code:

import urllib.request
data = urllib.request.urlopen('https://consultarsimit.co').read().decode()

from bs4 import BeautifulSoup
soup =  BeautifulSoup(data)
tags = soup('a')
for tag in tags:
		print(tag.get('href'))
How can I save the links in a database and query them with multi-threading?

Thanks!
Reply
#2
use requests rather than urllib.request (needs to be installed pip install requests)
follow snippsat's tutorial here:
web scraping part 1
web scraping part 2

then to get a list of all links:
linklist = soup.find_all('a')
for link in linklist:
    print(f"{link.get('href')}")
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Scraping all website text using Python MKMKMKMK 1 197 Nov-26-2020, 10:35 PM
Last Post: Larz60+
  scraping from a website that hides source code PIWI_Protein 1 439 Mar-27-2020, 05:08 PM
Last Post: Larz60+
  Scraping not moving to the next pages in a website jithin123 0 413 Mar-23-2020, 06:10 PM
Last Post: jithin123
  Random Loss of Control of Website When Scraping bmccollum 0 467 Aug-30-2019, 04:04 AM
Last Post: bmccollum
  MaxRetryError while scraping a website multiple times kawasso 6 6,436 Aug-29-2019, 05:25 PM
Last Post: kawasso
  scraping multiple pages of a website. Blue Dog 14 16,609 Jun-21-2018, 09:03 PM
Last Post: Blue Dog
  Scraping number in % from website santax 3 2,530 Mar-19-2017, 12:22 PM
Last Post: santax

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020