Thank you, but I think I found out why it doesn't work and still need a little help.
Here is the code:
self.crawl_page("First spider",spider.base_url)#first spider is crawling the main page of website
TypeError: crawl_page() takes 2 positional arguments but 3 were given
I'm not seeing how I'm giving a third argument when calling upon crawl_page()
Here is the code:
def __init__(self,project_name,base_url,domain_name): spider.project_name=project_name spider.base_url=base_url spider.domain_name=domain_name spider.queue_file=spider.project_name+'/queue.txt'#setting up the file path for the queue text file spider.crawled_file=spider.project_name+'/crawled.txt' self.boot() self.crawl_page('First spider',spider.base_url)#first spider is crawling the main page of website
def crawl_page(thread_name,page_url):#method displaying that the webpage is currently being crawled, so the user knows it's working if page_url not in spider.crawled:#using crawled set due to faster operations NOT the file print(thread_name+' currently crawling '+page_url) print('Queue ' +str(len(spider.queue)) + ' | cralwed ' +str(len(spider.crawled)) )#'spider.queue' is by default an integer, so 'len()' is used to get how many items are in the set, and 'str()' is used to convert everything to string. spider.add_links_to_queue(spider.gather_links(page_url))#'spider.gather_links(page_url)' will connect to a web page and gather the links there. 'spider.add_links_to_queue' will take those gathered links and add them to the waiting list spider.queue.remove(page_url)#removing the links from the queue set spider.crawled.add(page_url)#adding the removed links to the crawled set spider.update_files()line 20, in __init__
self.crawl_page("First spider",spider.base_url)#first spider is crawling the main page of website
TypeError: crawl_page() takes 2 positional arguments but 3 were given
I'm not seeing how I'm giving a third argument when calling upon crawl_page()