Python Forum

Full Version: Scraping a Website (HELP)
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi, I need to scrape out all the links on a website.

I want to crawl something like Screaming Frog but with Python.

This is my code:


import urllib.request
data = urllib.request.urlopen('https://consultarsimit.co').read().decode()

from bs4 import BeautifulSoup
soup =  BeautifulSoup(data)
tags = soup('a')
for tag in tags:
		print(tag.get('href'))


How can I save the links in a database and query them with multi-threading?

Thanks!
use requests rather than urllib.request (needs to be installed pip install requests)
follow snippsat's tutorial here:
web scraping part 1
web scraping part 2

then to get a list of all links:
linklist = soup.find_all('a')
for link in linklist:
    print(f"{link.get('href')}")