Python Forum
multithreading issue with output
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
multithreading issue with output
#1
My script is working correctly when i have only one thread.
when I make 50 threads, the output result in 'table' seems to have missing entries in each row ! for example column 'Average Volume' sometimes have missing results for some index. this issue is sporadic !
I am sure that each thread write in different index i.e the threads doesn't overwrite each others !

table = pd.DataFrame(index =tickers,columns = some_columns)

def processData(q,table):
    while not q.empty():
        ticker = q.get()
        #
        if  bad_condition:
            q.task_done()
            continue
              
        try:
                if bad_condition:
                    q.task_done()
                    continue
                
#######A lot of code here               
                

                table.loc[ticker,'Price']=lastPrice
                table.loc[ticker,'Shares Outstanding']=sharesOutstanding
                table.loc[ticker,'Capital']=Capital
                table.loc[ticker,'Average Volume']=averageVolume
                
        except urllib.error.HTTPError:
            print(ticker,'doesnt exist on yahoo finance')
        except urllib.error.URLError:
            print(ticker,'yahoo finance has issue')
        q.task_done()
    return True

num_theads = 50
q = Queue(maxsize=0)
for ticker in table.index:
    q.put(ticker)
for i in range(0,num_theads):
    worker = Thread(target=processData, args=(q,table))
    worker.setDaemon(True)
    worker.start()    
q.join()
table.to_csv('result.csv') 
Reply
#2
Is there any regularity in the missing data points? Are they dispersed sporadically across the row or are they clustered toward the end of the row?

I'm not certain, but I doubt DataFrames are threadsafe which means one thread can interrupt another thread. That could be the issue. Based on the snippet I'm seeing, I imagine that interruption would affect the later fields that get filled in - such as Average Volume which is the last one listed in processData().
Reply
#3
I think you are correct about safety using dataframe

Ref: https://pandas.pydata.org/pandas-docs/st...tchas.html

They stated:
Quote:As of pandas 0.11, pandas is not 100% thread safe. The known issues relate to the copy() method. If you are doing a lot of copying of DataFrame objects shared among threads, we recommend holding locks inside the threads where the data copying occurs.

I can see that I don't use copy ! I only use write in certain location and no other thread is accessing this place.
anyway to fix this issue?
Reply
#4
just wrong post that was deleted
Reply
#5
One way to fix it would be to wrap the DataFrame in a threadsafe wrapper. Look into threading.Lock objects in the standard library. You should be able to implement something along the lines of:

class LockingFrame(DataFrame):
    lock = threading.Lock

    def access():
        lock.acquire()
        [do stuff]
        lock.release()
I'm sure someone else has encountered this issue too so there may be a "LockingFrame" out there already.

I have helped someone previously divide a DataFrame into several smaller DataFrames for processing and then bring them back together later. It's been a while and I do not have that code readily available.

How many rows are in the data set? Is it feasible to perform the operation without multithreading? Or perhaps describe the project in more detail and provide the full code of processData().
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  multithreading Hanyx 4 1,283 Jul-29-2022, 07:28 AM
Last Post: Larz60+
Question Problems with variables in multithreading Wombaz 2 1,286 Mar-08-2022, 03:32 PM
Last Post: Wombaz
  Multithreading question amadeok 0 1,748 Oct-17-2020, 12:54 PM
Last Post: amadeok
  How can i add multithreading in this example WoodyWoodpecker1 3 2,452 Aug-11-2020, 05:30 PM
Last Post: deanhystad
  matplotlib multithreading catosp 0 2,909 Jul-03-2020, 09:33 AM
Last Post: catosp
  Multithreading dynamically syncronism Rodrigo 0 1,504 Nov-08-2019, 02:33 AM
Last Post: Rodrigo
  Locks in Multithreading Chuonon 0 1,822 Oct-03-2019, 04:16 PM
Last Post: Chuonon
  Multithreading alternative MartinV279 1 2,732 Aug-01-2019, 11:41 PM
Last Post: scidam
  Output issue twinpiques 6 3,096 Jul-29-2019, 11:24 PM
Last Post: Yoriz
  using locks in multithreading in python3 srm 2 3,617 Jul-13-2019, 11:35 AM
Last Post: noisefloor

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020