Jan-12-2020, 10:29 AM
The repetition is in the Scanner.crawl() method. This method is recursively calling itself in line 56. When the first page contains a link to itself, the crawl will start again from the begin. Obviously this does not need to be the first page, this behaviour would always occur.
I see the __init__() method initializes
I see the __init__() method initializes
self.target_links = []
which is not used. I would suggest to make target_links a set instead of a list and use it to filter links already visited. (Because sets can easily be filtered, for example: return urls - self.target_links
.)