Python Forum
Python - Why multi threads are not working in this web crawler?
Thread Rating:
  • 2 Vote(s) - 3.5 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Python - Why multi threads are not working in this web crawler?
#1


Hi Team

I have a program in which I want threads want to be run in parallel.

import requests
from queue import Queue
import time
from bs4 import BeautifulSoup
import threading

urlList = []
q = Queue()

def url_c(url):

    try:
        r = requests.get(url)
        htmldoc = r.content

        if r.status_code in [400,404,403,408,409,501,502,503]:print (str(r.status_code)+"-"+str(r.status_code)+"-->"+url)               
        else: print ("no problem in-->",url)
        
        soup= BeautifulSoup(htmldoc,'html.parser')
        links = []
        links = soup.findAll('a')

        if len(links)>0:
            for link in links:
                if link.get('href') not in urlList and link.get('href') is not None and len(link.get('href'))>10 and 'JavaScript' not in link.get('href'):
                    if 'http' not in link.get('href'):
                        urlList.append(url + link.get('href'))
                    else:
                        urlList.append(link.get('href'))
    except:
        print("ERROR ",url)


def threader():
    while True:
        url = q.get
        url_c(url)
        q.task_done()



# how many threads are we going to allow for
for x in range(10):
     t = threading.Thread(target=threader)

     # classifying as a daemon, so they will die when the main dies
     t.daemon = True

     # begins, must come after daemon definition
     t.start()


    
def main():
    global end 
    global start

    end = 0
    start =1

    print('Enter the URL')
    url = input()
    
    url_c(url)
    end = len(urlList)
    
    while start != end:
        end = len(urlList)
        print(start)
        url_c(urlList[start])
        q.put(urlList[start])
        print(end)
        start +=1
        
    i =1
    for u in urlList:
        print('length is ->',len(u),'-',u)
        i +=1

    print('There are ',len(urlList),' links.')


main()
Please advise what to do?
Reply
#2
what are the specific issues?
what are the full, verbatim error tracebacks, if any?
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Python crawler reports errors for some Chinese characters yliu315 0 931 Sep-11-2022, 06:17 PM
Last Post: yliu315
  Python web crawler and input command not having the correct results see below for mor samlee916 0 1,465 Jul-25-2020, 08:24 PM
Last Post: samlee916
  Web Crawler help Mr_Mafia 2 1,847 Apr-04-2020, 07:20 PM
Last Post: Mr_Mafia
  Web Crawler help takaa 39 26,862 Apr-26-2019, 12:14 PM
Last Post: stateitreal
  email crawler in python aaa 0 2,594 May-18-2018, 07:03 PM
Last Post: aaa
  Multi-Threaded Alexa Website Ranker Problem - All Threads Doing Same Task digitalmatic7 0 2,354 Feb-28-2018, 09:21 AM
Last Post: digitalmatic7
  Web Crawler Not Working chrisdas 13 11,101 Feb-06-2017, 10:45 PM
Last Post: scriptso

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020