Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Web Crawler help
#3
Thank you, but I think I found out why it doesn't work and still need a little help.
Here is the code:
def __init__(self,project_name,base_url,domain_name):
        spider.project_name=project_name
        spider.base_url=base_url
        spider.domain_name=domain_name
        spider.queue_file=spider.project_name+'/queue.txt'#setting up the file path for the queue text file
        spider.crawled_file=spider.project_name+'/crawled.txt'
        self.boot()
        self.crawl_page('First spider',spider.base_url)#first spider is crawling the main page of website
def crawl_page(thread_name,page_url):#method displaying that the webpage is currently being crawled, so the user knows it's working
        if page_url not in spider.crawled:#using crawled set due to faster operations NOT the file
            print(thread_name+' currently crawling '+page_url)
            print('Queue ' +str(len(spider.queue)) + ' | cralwed ' +str(len(spider.crawled)) )#'spider.queue' is by default an integer, so 'len()' is used to get how many items are in the set, and 'str()' is used to convert everything to string.
            spider.add_links_to_queue(spider.gather_links(page_url))#'spider.gather_links(page_url)' will connect to a web page and gather the links there. 'spider.add_links_to_queue' will take those gathered links and add them to the waiting list
            spider.queue.remove(page_url)#removing the links from the queue set 
            spider.crawled.add(page_url)#adding the removed links to the crawled set 
            spider.update_files()
line 20, in __init__
self.crawl_page("First spider",spider.base_url)#first spider is crawling the main page of website
TypeError: crawl_page() takes 2 positional arguments but 3 were given

I'm not seeing how I'm giving a third argument when calling upon crawl_page()
Reply


Messages In This Thread
Web Crawler help - by Mr_Mafia - Apr-02-2020, 08:24 PM
RE: Web Crawler help - by Larz60+ - Apr-02-2020, 09:01 PM
RE: Web Crawler help - by Mr_Mafia - Apr-04-2020, 07:20 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Web Crawler help takaa 39 27,312 Apr-26-2019, 12:14 PM
Last Post: stateitreal

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020