Python Forum
Python - Why multi threads are not working in this web crawler?
Thread Rating:
  • 2 Vote(s) - 3.5 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Python - Why multi threads are not working in this web crawler?
#1


Hi Team

I have a program in which I want threads want to be run in parallel.

import requests
from queue import Queue
import time
from bs4 import BeautifulSoup
import threading

urlList = []
q = Queue()

def url_c(url):

    try:
        r = requests.get(url)
        htmldoc = r.content

        if r.status_code in [400,404,403,408,409,501,502,503]:print (str(r.status_code)+"-"+str(r.status_code)+"-->"+url)               
        else: print ("no problem in-->",url)
        
        soup= BeautifulSoup(htmldoc,'html.parser')
        links = []
        links = soup.findAll('a')

        if len(links)>0:
            for link in links:
                if link.get('href') not in urlList and link.get('href') is not None and len(link.get('href'))>10 and 'JavaScript' not in link.get('href'):
                    if 'http' not in link.get('href'):
                        urlList.append(url + link.get('href'))
                    else:
                        urlList.append(link.get('href'))
    except:
        print("ERROR ",url)


def threader():
    while True:
        url = q.get
        url_c(url)
        q.task_done()



# how many threads are we going to allow for
for x in range(10):
     t = threading.Thread(target=threader)

     # classifying as a daemon, so they will die when the main dies
     t.daemon = True

     # begins, must come after daemon definition
     t.start()


    
def main():
    global end 
    global start

    end = 0
    start =1

    print('Enter the URL')
    url = input()
    
    url_c(url)
    end = len(urlList)
    
    while start != end:
        end = len(urlList)
        print(start)
        url_c(urlList[start])
        q.put(urlList[start])
        print(end)
        start +=1
        
    i =1
    for u in urlList:
        print('length is ->',len(u),'-',u)
        i +=1

    print('There are ',len(urlList),' links.')


main()
Please advise what to do?
Reply
#2
what are the specific issues?
what are the full, verbatim error tracebacks, if any?
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Python web crawler and input command not having the correct results see below for mor samlee916 0 518 Jul-25-2020, 08:24 PM
Last Post: samlee916
  email crawler in python aaa 0 1,659 May-18-2018, 07:03 PM
Last Post: aaa
  Multi-Threaded Alexa Website Ranker Problem - All Threads Doing Same Task digitalmatic7 0 1,370 Feb-28-2018, 09:21 AM
Last Post: digitalmatic7
  Web Crawler Not Working chrisdas 13 7,374 Feb-06-2017, 10:45 PM
Last Post: scriptso

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020