Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Dataframe problem
#1
Hello guys, so I am running into an issue I cant sort out.

I have 2 threads running.

Thread 1: is pulling stock info into a dataframe every 1 min.

Thread 2: is running logic on top of that dataframe every 0.2 secs.

If I pull stock info for lets say across 1 Day every 1 min (to get the price increases every 1 min) my Dataframe fills up (using example) 5000 rows. Then another minute it fills up another 5000 rows, and on and on every 1 min.

I am appending the info into the dataframe. Is there any way instead of appending, to just straight replace the dataframe with the new batch? Because what ends up happening is the dataframe is filling up with these massive chunks of looping data every few hours which then distorts the logic system.

I've tried doing a clear() command at the beginning of the Thread 1 data pull, so it clears anything in the Dataframe, then replace. And that works fine... but... even tho that happpens once every 1 min, its almost guaranteed that Thread 2 will try and reference the dataframe at the right time and nothing will be there, causing an error in the program

Any way to solve this problem? Big Grin Thumbs Up
Reply
#2
If clear() works and solves your issue, and the only problem is that it could happen while the other thread is trying to do something, then use a lock to make sure only one thread is interacting with it at a time.

import threading

shared_lock = threading.Lock()

# thread 1
with shared_lock:
    df.clear()
    # fill df with data

# thread 2
with shared_lock:
    # we now can guarantee the df won't be cleared while we're reading from it
    #...process df
stylingpat likes this post
Reply
#3
Damn thats pretty smart lol Cool Ty ty Dance
Reply
#4
So I just learned that threads will stop other threads from running, while they themselves are running. That they just give the appearance of running at the same time?

If this is true, this would explain why after about an hour, my program just "forgets what its doing"... but technically... I think the timers are getting out of sync, because each thread has some time.sleep in it. And each thread has to start EXACTLY 3 seconds after the minute.

I just sat down to implement running the locks. Its been a busy week LOL. I'll see if this fixes the problem.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  2-dataframe, datetime lookup problem Mark17 0 1,215 Jan-27-2022, 01:02 AM
Last Post: Mark17
  problem writing dataframe to oracle aliyesami 4 2,586 Sep-25-2021, 11:20 PM
Last Post: SamHobbs
  Problem in saving .xlsm (excel) file using pandas dataframe in python shantanu97 2 4,158 Aug-29-2021, 12:39 PM
Last Post: snippsat
  Problem with If statement and dataframe Milfredo 1 1,738 Sep-16-2020, 05:50 AM
Last Post: Milfredo
  optimization problem for dataframe manipulation fimmu 0 1,436 Aug-31-2020, 06:02 PM
Last Post: fimmu
  Dataframe mean calculation problem: do we have to loop? sparkt 1 2,132 Aug-28-2020, 02:41 PM
Last Post: sparkt

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020