Python Forum

Hi, I need to scrape out all the links on a website.

I want to crawl something like Screaming Frog but with Python.

This is my code:

import urllib.request
data = urllib.request.urlopen('https://consultarsimit.co').read().decode()

from bs4 import BeautifulSoup
soup =  BeautifulSoup(data)
tags = soup('a')
for tag in tags:
		print(tag.get('href'))

How can I save the links in a database and query them with multi-threading?

Thanks!

use requests rather than urllib.request (needs to be installed pip install requests)
follow snippsat's tutorial here:
web scraping part 1
web scraping part 2

then to get a list of all links:

linklist = soup.find_all('a')
for link in linklist:
    print(f"{link.get('href')}")

LearnPython2

Larz60+