Web Crawler help

Mr_Mafia · (This post was last modified: Apr-04-2020, 07:21 PM by Mr_Mafia.)

Thank you, but I think I found out why it doesn't work and still need a little help.
Here is the code:

def __init__(self,project_name,base_url,domain_name):
        spider.project_name=project_name
        spider.base_url=base_url
        spider.domain_name=domain_name
        spider.queue_file=spider.project_name+'/queue.txt'#setting up the file path for the queue text file
        spider.crawled_file=spider.project_name+'/crawled.txt'
        self.boot()
        self.crawl_page('First spider',spider.base_url)#first spider is crawling the main page of website

def crawl_page(thread_name,page_url):#method displaying that the webpage is currently being crawled, so the user knows it's working
        if page_url not in spider.crawled:#using crawled set due to faster operations NOT the file
            print(thread_name+' currently crawling '+page_url)
            print('Queue ' +str(len(spider.queue)) + ' | cralwed ' +str(len(spider.crawled)) )#'spider.queue' is by default an integer, so 'len()' is used to get how many items are in the set, and 'str()' is used to convert everything to string.
            spider.add_links_to_queue(spider.gather_links(page_url))#'spider.gather_links(page_url)' will connect to a web page and gather the links there. 'spider.add_links_to_queue' will take those gathered links and add them to the waiting list
            spider.queue.remove(page_url)#removing the links from the queue set 
            spider.crawled.add(page_url)#adding the removed links to the crawled set 
            spider.update_files()

line 20, in __init__
self.crawl_page("First spider",spider.base_url)#first spider is crawling the main page of website
TypeError: crawl_page() takes 2 positional arguments but 3 were given

I'm not seeing how I'm giving a third argument when calling upon crawl_page()

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Web Crawler help	takaa	39	27,312	Apr-26-2019, 12:14 PM Last Post: stateitreal

Web Crawler help

User Panel Messages

Announcements