Scraping a Website (HELP) - Printable Version

Scraping a Website (HELP) - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html)
+--- Thread: Scraping a Website (HELP) (/thread-26643.html)

Scraping a Website (HELP) - LearnPython2 - May-08-2020

Hi, I need to scrape out all the links on a website.

I want to crawl something like Screaming Frog but with Python.

This is my code:

import urllib.request
data = urllib.request.urlopen('https://consultarsimit.co').read().decode()

from bs4 import BeautifulSoup
soup =  BeautifulSoup(data)
tags = soup('a')
for tag in tags:
		print(tag.get('href'))

How can I save the links in a database and query them with multi-threading?

Thanks!

RE: Scraping a Website (HELP) - Larz60+ - May-08-2020

use requests rather than urllib.request (needs to be installed pip install requests)
follow snippsat's tutorial here:
web scraping part 1
web scraping part 2

then to get a list of all links:

linklist = soup.find_all('a')
for link in linklist:
    print(f"{link.get('href')}")