Python Forum
multithreading issue with output
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
multithreading issue with output
My script is working correctly when i have only one thread.
when I make 50 threads, the output result in 'table' seems to have missing entries in each row ! for example column 'Average Volume' sometimes have missing results for some index. this issue is sporadic !
I am sure that each thread write in different index i.e the threads doesn't overwrite each others !

table = pd.DataFrame(index =tickers,columns = some_columns)

def processData(q,table):
    while not q.empty():
        ticker = q.get()
        if  bad_condition:
                if bad_condition:
#######A lot of code here               

                table.loc[ticker,'Shares Outstanding']=sharesOutstanding
                table.loc[ticker,'Average Volume']=averageVolume
        except urllib.error.HTTPError:
            print(ticker,'doesnt exist on yahoo finance')
        except urllib.error.URLError:
            print(ticker,'yahoo finance has issue')
    return True

num_theads = 50
q = Queue(maxsize=0)
for ticker in table.index:
for i in range(0,num_theads):
    worker = Thread(target=processData, args=(q,table))
Is there any regularity in the missing data points? Are they dispersed sporadically across the row or are they clustered toward the end of the row?

I'm not certain, but I doubt DataFrames are threadsafe which means one thread can interrupt another thread. That could be the issue. Based on the snippet I'm seeing, I imagine that interruption would affect the later fields that get filled in - such as Average Volume which is the last one listed in processData().
I think you are correct about safety using dataframe


They stated:
Quote:As of pandas 0.11, pandas is not 100% thread safe. The known issues relate to the copy() method. If you are doing a lot of copying of DataFrame objects shared among threads, we recommend holding locks inside the threads where the data copying occurs.

I can see that I don't use copy ! I only use write in certain location and no other thread is accessing this place.
anyway to fix this issue?
just wrong post that was deleted
One way to fix it would be to wrap the DataFrame in a threadsafe wrapper. Look into threading.Lock objects in the standard library. You should be able to implement something along the lines of:

class LockingFrame(DataFrame):
    lock = threading.Lock

    def access():
        [do stuff]
I'm sure someone else has encountered this issue too so there may be a "LockingFrame" out there already.

I have helped someone previously divide a DataFrame into several smaller DataFrames for processing and then bring them back together later. It's been a while and I do not have that code readily available.

How many rows are in the data set? Is it feasible to perform the operation without multithreading? Or perhaps describe the project in more detail and provide the full code of processData().

Possibly Related Threads…
Thread Author Replies Views Last Post
  Multithreading question amadeok 0 489 Oct-17-2020, 12:54 PM
Last Post: amadeok
  How can i add multithreading in this example WoodyWoodpecker1 3 755 Aug-11-2020, 05:30 PM
Last Post: deanhystad
  matplotlib multithreading catosp 0 1,084 Jul-03-2020, 09:33 AM
Last Post: catosp
  Multithreading dynamically syncronism Rodrigo 0 628 Nov-08-2019, 02:33 AM
Last Post: Rodrigo
  Locks in Multithreading Chuonon 0 656 Oct-03-2019, 04:16 PM
Last Post: Chuonon
  Multithreading alternative MartinV279 1 1,214 Aug-01-2019, 11:41 PM
Last Post: scidam
  Output issue twinpiques 6 1,349 Jul-29-2019, 11:24 PM
Last Post: Yoriz
  using locks in multithreading in python3 srm 2 1,381 Jul-13-2019, 11:35 AM
Last Post: noisefloor
  Error in implementing multithreading in a class srm 2 868 May-16-2019, 03:54 PM
Last Post: Yoriz
  re.finditer issue, output is blank anna 1 1,029 Feb-07-2019, 10:41 AM
Last Post: stranac

Forum Jump:

User Panel Messages

Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020