Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Dataframe problem
Hello guys, so I am running into an issue I cant sort out.

I have 2 threads running.

Thread 1: is pulling stock info into a dataframe every 1 min.

Thread 2: is running logic on top of that dataframe every 0.2 secs.

If I pull stock info for lets say across 1 Day every 1 min (to get the price increases every 1 min) my Dataframe fills up (using example) 5000 rows. Then another minute it fills up another 5000 rows, and on and on every 1 min.

I am appending the info into the dataframe. Is there any way instead of appending, to just straight replace the dataframe with the new batch? Because what ends up happening is the dataframe is filling up with these massive chunks of looping data every few hours which then distorts the logic system.

I've tried doing a clear() command at the beginning of the Thread 1 data pull, so it clears anything in the Dataframe, then replace. And that works fine... but... even tho that happpens once every 1 min, its almost guaranteed that Thread 2 will try and reference the dataframe at the right time and nothing will be there, causing an error in the program

Any way to solve this problem? Big Grin Thumbs Up
If clear() works and solves your issue, and the only problem is that it could happen while the other thread is trying to do something, then use a lock to make sure only one thread is interacting with it at a time.

import threading

shared_lock = threading.Lock()

# thread 1
with shared_lock:
    # fill df with data

# thread 2
with shared_lock:
    # we now can guarantee the df won't be cleared while we're reading from it
    #...process df
stylingpat likes this post
Damn thats pretty smart lol Cool Ty ty Dance
So I just learned that threads will stop other threads from running, while they themselves are running. That they just give the appearance of running at the same time?

If this is true, this would explain why after about an hour, my program just "forgets what its doing"... but technically... I think the timers are getting out of sync, because each thread has some time.sleep in it. And each thread has to start EXACTLY 3 seconds after the minute.

I just sat down to implement running the locks. Its been a busy week LOL. I'll see if this fixes the problem.

Possibly Related Threads…
Thread Author Replies Views Last Post
  Problem with If statement and dataframe Milfredo 1 455 Sep-16-2020, 05:50 AM
Last Post: Milfredo
  optimization problem for dataframe manipulation fimmu 0 404 Aug-31-2020, 06:02 PM
Last Post: fimmu
  Dataframe mean calculation problem: do we have to loop? sparkt 1 632 Aug-28-2020, 02:41 PM
Last Post: sparkt

Forum Jump:

User Panel Messages

Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020