Jul-06-2019, 08:16 PM
I'm working on a Crawler application.
My goal is to write a broken link detection tool.
So I'm going to check all the links on a site.
When
I'm trying to solve the problem with another method without using the Recursion method as a solution.
How do I perform a web crawler with a loop or another method without using the recursion method?
Web crawler My code
My goal is to write a broken link detection tool.
So I'm going to check all the links on a site.
When
crawl()
is used after 966 times, it generates the following error.Error:RecursionError: maximum recursion depth exceeded in comparison
To solve the problem, I used the following code and increased the Recursion limit.import sys sys.setrecursionlimit(30000)However, when the recursion count is 3924, the python application closes.
I'm trying to solve the problem with another method without using the Recursion method as a solution.
How do I perform a web crawler with a loop or another method without using the recursion method?
Web crawler My code
#!usr/bin/env python 3.7.2 # -*- coding: utf-8 -*- import requests import re from urllib.parse import urljoin target_links=[] url_Adres="http://crummy.com" def get_links( url): try: if "http://" in url or "https://" in url: response = requests.get(url) return re.findall('(?:href=")(.*?)"', str(response.content)) else: response = requests.get("http://" + url) return re.findall('(?:href=")(.*?)"', str(response.content)) except requests.exceptions.ConnectionError: pass except requests.exceptions.InvalidSchema: pass except requests.exceptions.InvalidURL: pass except UnicodeError: pass def crawl( url): href_links = get_links(url) if href_links: for link in href_links: link = urljoin(url, link) if "#" in link: link = link.split("#")[0] if url_Adres in link and link not in target_links: target_links.append(link) print("Crawler:"+link) crawl(link) crawl(url_Adres)