Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 multithreading issue with output
#1
My script is working correctly when i have only one thread.
when I make 50 threads, the output result in 'table' seems to have missing entries in each row ! for example column 'Average Volume' sometimes have missing results for some index. this issue is sporadic !
I am sure that each thread write in different index i.e the threads doesn't overwrite each others !

table = pd.DataFrame(index =tickers,columns = some_columns)

def processData(q,table):
    while not q.empty():
        ticker = q.get()
        #
        if  bad_condition:
            q.task_done()
            continue
              
        try:
                if bad_condition:
                    q.task_done()
                    continue
                
#######A lot of code here               
                

                table.loc[ticker,'Price']=lastPrice
                table.loc[ticker,'Shares Outstanding']=sharesOutstanding
                table.loc[ticker,'Capital']=Capital
                table.loc[ticker,'Average Volume']=averageVolume
                
        except urllib.error.HTTPError:
            print(ticker,'doesnt exist on yahoo finance')
        except urllib.error.URLError:
            print(ticker,'yahoo finance has issue')
        q.task_done()
    return True

num_theads = 50
q = Queue(maxsize=0)
for ticker in table.index:
    q.put(ticker)
for i in range(0,num_theads):
    worker = Thread(target=processData, args=(q,table))
    worker.setDaemon(True)
    worker.start()    
q.join()
table.to_csv('result.csv') 
Quote
#2
Is there any regularity in the missing data points? Are they dispersed sporadically across the row or are they clustered toward the end of the row?

I'm not certain, but I doubt DataFrames are threadsafe which means one thread can interrupt another thread. That could be the issue. Based on the snippet I'm seeing, I imagine that interruption would affect the later fields that get filled in - such as Average Volume which is the last one listed in processData().
Quote
#3
I think you are correct about safety using dataframe

Ref: https://pandas.pydata.org/pandas-docs/st...tchas.html

They stated:
Quote:As of pandas 0.11, pandas is not 100% thread safe. The known issues relate to the copy() method. If you are doing a lot of copying of DataFrame objects shared among threads, we recommend holding locks inside the threads where the data copying occurs.

I can see that I don't use copy ! I only use write in certain location and no other thread is accessing this place.
anyway to fix this issue?
Quote
#4
just wrong post that was deleted
Quote
#5
One way to fix it would be to wrap the DataFrame in a threadsafe wrapper. Look into threading.Lock objects in the standard library. You should be able to implement something along the lines of:

class LockingFrame(DataFrame):
    lock = threading.Lock

    def access():
        lock.acquire()
        [do stuff]
        lock.release()
I'm sure someone else has encountered this issue too so there may be a "LockingFrame" out there already.

I have helped someone previously divide a DataFrame into several smaller DataFrames for processing and then bring them back together later. It's been a while and I do not have that code readily available.

How many rows are in the data set? Is it feasible to perform the operation without multithreading? Or perhaps describe the project in more detail and provide the full code of processData().
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  Multithreading dynamically syncronism Rodrigo 0 159 Nov-08-2019, 02:33 AM
Last Post: Rodrigo
  Locks in Multithreading Chuonon 0 195 Oct-03-2019, 04:16 PM
Last Post: Chuonon
  Multithreading alternative MartinV279 1 339 Aug-01-2019, 11:41 PM
Last Post: scidam
  Output issue twinpiques 6 488 Jul-29-2019, 11:24 PM
Last Post: Yoriz
  using locks in multithreading in python3 srm 2 423 Jul-13-2019, 11:35 AM
Last Post: noisefloor
  Error in implementing multithreading in a class srm 2 343 May-16-2019, 03:54 PM
Last Post: Yoriz
  re.finditer issue, output is blank anna 1 411 Feb-07-2019, 10:41 AM
Last Post: stranac
  Queue get memory leak when used in multithreading wangcp 1 1,485 Nov-27-2018, 04:06 AM
Last Post: wangcp
  Multithreading with queues - code optimization h1v3s3c 1 902 May-10-2018, 10:40 AM
Last Post: ThiefOfTime
  Paramiko output printing issue anna 3 7,737 Feb-06-2018, 08:34 AM
Last Post: anna

Forum Jump:


Users browsing this thread: 1 Guest(s)