Python Forum
Make a Web crawler without using the recursion method.
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Make a Web crawler without using the recursion method.
#1
I'm working on a Crawler application.
My goal is to write a broken link detection tool.
So I'm going to check all the links on a site.

When crawl() is used after 966 times, it generates the following error.

Error:
RecursionError: maximum recursion depth exceeded in comparison
To solve the problem, I used the following code and increased the Recursion limit.

import sys
sys.setrecursionlimit(30000)
However, when the recursion count is 3924, the python application closes.

I'm trying to solve the problem with another method without using the Recursion method as a solution.
How do I perform a web crawler with a loop or another method without using the recursion method?

Web crawler My code

#!usr/bin/env python 3.7.2
# -*- coding: utf-8 -*-
import requests
import re
from urllib.parse import urljoin
target_links=[]
url_Adres="http://crummy.com"

def get_links( url):
    try:
        if "http://" in url or "https://" in url:
            response = requests.get(url)
            return re.findall('(?:href=")(.*?)"', str(response.content))
        else:
            response = requests.get("http://" + url)
            return re.findall('(?:href=")(.*?)"', str(response.content))
    except requests.exceptions.ConnectionError:
        pass
    except requests.exceptions.InvalidSchema:
        pass
    except requests.exceptions.InvalidURL:
        pass
    except UnicodeError:
        pass


def crawl( url):
    href_links = get_links(url)
    if href_links:
        for link in href_links:
            link = urljoin(url, link)
            if "#" in link:
                link = link.split("#")[0]
            if url_Adres in link and link not in target_links:
                target_links.append(link)
                print("Crawler:"+link)
                crawl(link)
crawl(url_Adres)
Reply


Messages In This Thread
Make a Web crawler without using the recursion method. - by hibritusta - Jul-06-2019, 08:16 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Web Crawler help Mr_Mafia 2 1,920 Apr-04-2020, 07:20 PM
Last Post: Mr_Mafia
  Web Crawler help takaa 39 27,503 Apr-26-2019, 12:14 PM
Last Post: stateitreal

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020