Python Forum
Saving progress in a Python program to use later
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Saving progress in a Python program to use later
#1
I have a program that takes forty-five minutes to run. That makes it harder to debug.

It takes 40 minutes to get to the critical area. If I do not correctly debug, it takes
another forty-five minutes to get back to the critical area.

That is no good. I want to know two things:

1. Is it possible to write a dataframe frame to a csv file on the hard drive? I know how to read a csv file; I just do not know if a dataframe can be directly written to a csv file on the hard drive.

2. Since Python is so different from other languages, does it have a way to do this that is not as awkward a no 1? Anyway, that improves on 1 would be nice.

Any help appreciated. Thanks in advance.

R,

LZ
Reply
#2
1. I'm pretty sure you've already done this. When pandas writes a csv file it is going to be on the hard drive (unless you specify a path to a different storage device). Where else is pandas going to write the CSV file? Be careful that you don't write the row index while doing so or you are going to get unnamed columns when you load it back in. You can also save your dataframe in a json file.

2. The way to do this is not really Python specific, just good programming practice. Write some tests.

You should not test your program by running it. This is slow, unreliable, and it is difficult to test all the program features. Instead of running the main program you should write a bunch of small programs that exercise different parts of the main program. These are called tests. Because the test programs are short they don't take a long time to write or run. The tests can be designed to exercise parts of the software that are difficult to test through the interface. You can also design tests that work a problematic part of the program.

There are several tools in the Python community to help write these tests. The tools will even automatically run your tests and report which tests pass and which fail. This is very useful for regression testing when you want to verify that a recent change didn't break anything in your program.

For your particular problem right now, I suggest two tests to start. One that gets you to just before the crash and writes a CSV file. A second test that continues from that point, starting by loading the dataframe from the file. That way you can start your debugging right where the program crashes.
ndc85430 likes this post
Reply
#3
(Sep-08-2022, 06:16 PM)Led_Zeppelin Wrote: It takes 40 minutes to get to the critical area.
In addition to @deanhystad 's advice, I suggest that you cut the program in two or three different programs, each performing a part of the task that has to be done. A strict enforcement of separation of concerns will solve the issue.
Reply
#4
I am way ahead of both of you. The 40-minute program is the mini program not the main program. It just takes a long time to run. I will write it to a csv file and save the headache of running it for 40 minutes to get to the point where the debugging is to take place.

I was hoping there was unknown feature in python that could address this problem. The writing it to a csv file and reading later does seem a long way around.

I was looking for a shortcut.

Thanks for your help.

Respectfully,

LZ
Reply
#5
It sure would help if you posted the code.
that said pandas has a command for dataframe to csv, see: https://pandas.pydata.org/docs/reference...o_csv.html
Reply
#6
(Sep-08-2022, 06:16 PM)Led_Zeppelin Wrote: I have a program that takes forty-five minutes to run. That makes it harder to debug.

It takes 40 minutes to get to the critical area
I am not sure what exactly your problem is, but perhaps you can save the status of your variables after 39 minutes. Maybe with pickle, maybe with json.
Reply
#7
What is your program doing if it's taking so long to run? Maybe that's a reasonable time, but it's impossible to say without knowing any details.

On testing, there are different approaches and tools for doing this. Since your code likely has no tests right now, one approach for getting it under test is called approvals testing. Emily Bache gave a great talk on the idea: https://youtu.be/NOxThHYEX9I. Her examples are in Java, but unsurprisingly, approvals testing frameworks exist for Python (e.g. https://github.com/approvals/ApprovalTests.Python).
Reply
#8
There are serval way to speed,maybe it can be done in code(as you don't post a sample) or maybe not.
Dask is maybe the easiest.
Look at Dask DataFrame.
Dask’s API is 99% identical to Pandas, so you shouldn’t have any trouble switching.
Dask also run by default in parallel on all CPU cores.

Also mention Numba which has own engine="numba" in Pandas.
Reply
#9
Just another thought since we don't know what your program does, just that it uses Pandas - is your program suitable for execution in a Jupyter notebook (or similar)? Splitting the task into cells could solve this, as long as the cells in question are non-destructive of the data generated by the slow part (or you could copy the data before entering the questionable cells).

Just an idea.
Reply
#10
Quote:2. Since Python is so different from other languages, does it have a way to do this that is not as awkward a no 1?
Python is not that different to other langue's,it has borrowed a lot stuff from other languages,often Python dos in a simpler way.

Also to mention that Pandas is different from Python in a programming way.have to think about that DataFrame are vectorized.
One of the most common beginner mistake is in Pandas,is eg to use a for loop like would in Python.
So in Pandas should use a different approach(that in standard Python),as use of build in vectorization methods are just a lot faster that using a loop.
How To Make Your Pandas Loop 71803 Times Faster.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Progress bar bnadir55 1 1,843 Apr-11-2022, 01:52 PM
Last Post: deanhystad
  Saving the state of a program... bytecrunch 5 3,721 Mar-09-2022, 07:09 AM
Last Post: ndc85430
  saving and loading text from the clipboard with python program MaartenRo 2 1,668 Jan-22-2022, 05:04 AM
Last Post: MaartenRo
  Showing and saving the output of a python file run through bash Rim 3 2,485 Oct-06-2021, 10:48 AM
Last Post: gerpark
  Progress Indicator for Xmodem 0.4.6 KenHorse 1 1,986 Jan-30-2021, 07:12 PM
Last Post: bowlofred
  saving only one line of a figure as an image (python matplotlib) nitrochloric 0 2,032 Nov-23-2020, 01:41 PM
Last Post: nitrochloric
  How can I add a progress bar for my software? aquerci 8 3,809 Nov-16-2019, 04:20 PM
Last Post: aquerci
  New to Python, help with saving image from camera tantony 2 3,879 Sep-13-2019, 05:19 PM
Last Post: tantony
  wget progress bar anasrocks 1 4,746 Jun-06-2019, 03:12 PM
Last Post: heiner55
  Saving python cmd output into a text file with colours kapilan15 2 7,087 Jan-25-2019, 06:25 PM
Last Post: metulburr

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020