How to delete portion of file already processed?

Mark17 · Jan-21-2022, 02:13 PM

test_file is an Excel .csv file that has one word in each cell reading down in column A:
"this is a test to see if the present rows can be read then deleted"

This code returns NameError: name 'pop' is not defined

words = open(r"C:test_file.csv", "r")

for line in words:
  print(line)
  pop(line)

words.close()

Is there a way this may be done? I don't want to corrupt the original file on disk,
and my thinking is as long as I don't re-save the processed (and corrupted because by
the end, the file is empty) file back to disk then the original will remain unaffected.

Mark17 · Jan-21-2022, 04:02 PM

This works:

words = open(r"C:test_file.csv", "r")

words_list = words.readlines()

while len(words_list) > 0:
    print(words_list[0],end="")
    words_list.pop(0)

print('Remaining file is now:  ',words_list)

Every word prints out on a separate line (with no blank lines in between) and when done, it says the remaining file is [].

I had trouble with a loop because after applying .pop(0), everything moves back one position but the iteration goes forward one position thereby skipping elements.

One consideration here is that .readlines() creates a list. This .csv file has 13 columns and 166000 rows per year (and then I have ~15 years). Will it bog down the processing speed to put that all in a list, parse each list element, delete the element, and move on?

bowlofred · Jan-21-2022, 04:04 PM

A .csv is just a text file. The only way to remove part of a text file (and have it remain a useful text file) is to rewrite it with the later portions moved up.

One technique would be that as you are processing your loop you write all the lines that you want to keep to a new file. When complete, swap the new file into the old file's place.

**perfringo** · Jan-21-2022, 04:07 PM

Why do you need to delete lines if your goal is 'not to corrupt original file'?

Mark17 · Jan-21-2022, 05:21 PM

(Jan-21-2022, 04:04 PM)bowlofred Wrote: A .csv is just a text file. The only way to remove part of a text file (and have it remain a useful text file) is to rewrite it with the later portions moved up.

One technique would be that as you are processing your loop you write all the lines that you want to keep to a new file. When complete, swap the new file into the old file's place.

It's the opposite. I want to keep all the remaining lines that have not yet been processed. Once a line is processed, it is no longer needed for that particular backtest.

Mark17 · (This post was last modified: Jan-21-2022, 05:26 PM by Mark17.)

(Jan-21-2022, 04:07 PM)perfringo Wrote: Why do you need to delete lines if your goal is 'not to corrupt original file'?

The original file is the source data for all backtests. When doing any single backtest, though, the program only needs to see each line once after which it is no longer needed.

Why delete? I (based on my beginner computer understanding) was thinking two things. First, without deleting it the program needs to iterate through an increasing number of unnecessary lines. If starting over at the top would be starting exactly where it needs to be, then I figured efficiency would increase dramatically. Second, I thought deleting already-processed lines would decrease the memory load on the computer and maybe benefit speed that way.

bowlofred · Jan-21-2022, 05:46 PM

You can do this with a real database. It can delete portions in the middle without touching later portions (and hopefully do it efficiently). You can't do that with a text file. The only way to delete is to rewrite the entire file.

Are you iterating over the file multiple times? If not why not just handle all the lines and then at the end you just erase the file (removing all the handled lines)?

If you are iterating over it multiple times, then why will some lines remain?

Mark17 · Jan-21-2022, 06:31 PM

(Jan-21-2022, 05:46 PM)bowlofred Wrote: You can do this with a real database. It can delete portions in the middle without touching later portions (and hopefully do it efficiently). You can't do that with a text file. The only way to delete is to rewrite the entire file.

Are you iterating over the file multiple times? If not why not just handle all the lines and then at the end you just erase the file (removing all the handled lines)?

If you are iterating over it multiple times, then why will some lines remain?

I know very little about databases. Perhaps that is the way to go.

With a database, can you go immediately to a particular date and start there? I'd have no need to delete anything, then, because there would be no worry about going through already-used rows that are no longer needed. The program would simply store a date for when the previous trade ended as the next trade will begin the very next day.

To answer your questions, as it stands right now I'm iterating through this file multiple times. I consider that a waste, though, since each row needs to be processed only once. That's why I raised the question in the first place.

As it stands, all the lines remain because I don't know of a way to delete them.

I did think about doing this as a dataframe. I could then keep track of the row number (index, which corresponds to date), but I got the sense using .iloc[] as an indexer might be slow/cumbersome because the program would still have to go through the file until it reached the stated index line. Would a database have any advantage in this respect?

bowlofred · (This post was last modified: Jan-21-2022, 09:01 PM by bowlofred.)

(Jan-21-2022, 06:31 PM)Mark17 Wrote: To answer your questions, as it stands right now I'm iterating through this file multiple times. I consider that a waste, though, since each row needs to be processed only once. That's why I raised the question in the first place

Could you just delete the file after you finish? If you hit the end, haven't all the rows been processed?

Deleting everything is trivial compared to just deleting some portions.

Mark17 · (This post was last modified: Jan-21-2022, 09:35 PM by Mark17.)

(Jan-21-2022, 09:00 PM)bowlofred Wrote:
(Jan-21-2022, 06:31 PM)Mark17 Wrote: To answer your questions, as it stands right now I'm iterating through this file multiple times. I consider that a waste, though, since each row needs to be processed only once. That's why I raised the question in the first place

Could you just delete the file after you finish? If you hit the end, haven't all the rows been processed?

Deleting everything is trivial compared to just deleting some portions.

The point of deleting along the way, in my mind, is to speed up the process. Imagine one row representing a day for 20 years and each backtrade lasting one calendar month. Imagine that the first trade is Jan 2002, second trade is Feb 2002, third trade is Mar 2002, etc. To run through each trade, the program must start from the top and iterate down until it finds the relevant dates. The farther into the backtest it goes, the more time is wasted. Ten years in, for example, begins around row 2500, which means the program has to iterate down 2500 rows until it hits the date it's looking for: 1/2/12 as the start of backtrade #121.

If I were able to delete rows along the way, then for every new trade the program would start near row 1 (maybe row 2 given a header) because all the rows just processed for the previous trade have been deleted since they are no longer needed for the current backtest. I thought this would represent a big savings of time and computational resources.

I don't care what happens to the file once the backtest is complete. When I run the program again for a new backtest, though, it will start by reading in the original datafile from disk. The original datafile from disk must be preserved; the read file in memory may be deleted. This is my thinking, anyway, as a beginner.

Feel free to correct any and all misconceptions. :)

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	output provide the filename along with the input file processed.	arjunaram	1	933	Apr-13-2023, 08:15 PM Last Post: menator01
	Delete multiple lines from txt file	Lky	6	2,284	Jul-10-2022, 12:09 PM Last Post: jefsummers
	Find and delete above a certain line in text file	cubangt	12	3,459	Mar-18-2022, 07:49 PM Last Post: snippsat
	delete a file works but with error	Leon79	4	2,920	Jul-14-2020, 06:51 AM Last Post: snippsat
	Find, delete and add text into pdf file	a_shvechkov	2	5,925	Jul-08-2020, 10:50 AM Last Post: a_shvechkov
	Delete all contents of a file from the fifth line?	PythonNPC	1	1,901	Apr-18-2020, 09:16 AM Last Post: buran
	code not writing to projNameVal portion of code.	umkc1	1	1,671	Feb-05-2020, 10:05 PM Last Post: Larz60+
	Can't seem to figure out how to delete several lines from a text file	Cosmosso	9	4,119	Dec-10-2019, 11:09 PM Last Post: Cosmosso
	delete file with handling	3Pinter	1	2,091	Oct-17-2019, 04:06 PM Last Post: 3Pinter
	delete a file after closing it	mcgrim	1	2,097	May-14-2019, 08:16 PM Last Post: Larz60+

How to delete portion of file already processed?

User Panel Messages

Announcements