Python Forum
How to delete portion of file already processed?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to delete portion of file already processed?
#1
test_file is an Excel .csv file that has one word in each cell reading down in column A:
"this is a test to see if the present rows can be read then deleted"

This code returns NameError: name 'pop' is not defined

words = open(r"C:test_file.csv", "r")

for line in words:
  print(line)
  pop(line)

words.close()
Is there a way this may be done? I don't want to corrupt the original file on disk,
and my thinking is as long as I don't re-save the processed (and corrupted because by
the end, the file is empty) file back to disk then the original will remain unaffected.
Reply
#2
This works:

words = open(r"C:test_file.csv", "r")

words_list = words.readlines()

while len(words_list) > 0:
    print(words_list[0],end="")
    words_list.pop(0)

print('Remaining file is now:  ',words_list)
Every word prints out on a separate line (with no blank lines in between) and when done, it says the remaining file is [].

I had trouble with a loop because after applying .pop(0), everything moves back one position but the iteration goes forward one position thereby skipping elements.

One consideration here is that .readlines() creates a list. This .csv file has 13 columns and 166000 rows per year (and then I have ~15 years). Will it bog down the processing speed to put that all in a list, parse each list element, delete the element, and move on?
Reply
#3
A .csv is just a text file. The only way to remove part of a text file (and have it remain a useful text file) is to rewrite it with the later portions moved up.


One technique would be that as you are processing your loop you write all the lines that you want to keep to a new file. When complete, swap the new file into the old file's place.
BashBedlam likes this post
Reply
#4
Why do you need to delete lines if your goal is 'not to corrupt original file'?
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply
#5
(Jan-21-2022, 04:04 PM)bowlofred Wrote: A .csv is just a text file. The only way to remove part of a text file (and have it remain a useful text file) is to rewrite it with the later portions moved up.


One technique would be that as you are processing your loop you write all the lines that you want to keep to a new file. When complete, swap the new file into the old file's place.

It's the opposite. I want to keep all the remaining lines that have not yet been processed. Once a line is processed, it is no longer needed for that particular backtest.
Reply
#6
(Jan-21-2022, 04:07 PM)perfringo Wrote: Why do you need to delete lines if your goal is 'not to corrupt original file'?

The original file is the source data for all backtests. When doing any single backtest, though, the program only needs to see each line once after which it is no longer needed.

Why delete? I (based on my beginner computer understanding) was thinking two things. First, without deleting it the program needs to iterate through an increasing number of unnecessary lines. If starting over at the top would be starting exactly where it needs to be, then I figured efficiency would increase dramatically. Second, I thought deleting already-processed lines would decrease the memory load on the computer and maybe benefit speed that way.
Reply
#7
You can do this with a real database. It can delete portions in the middle without touching later portions (and hopefully do it efficiently). You can't do that with a text file. The only way to delete is to rewrite the entire file.

Are you iterating over the file multiple times? If not why not just handle all the lines and then at the end you just erase the file (removing all the handled lines)?

If you are iterating over it multiple times, then why will some lines remain?
Reply
#8
(Jan-21-2022, 05:46 PM)bowlofred Wrote: You can do this with a real database. It can delete portions in the middle without touching later portions (and hopefully do it efficiently). You can't do that with a text file. The only way to delete is to rewrite the entire file.

Are you iterating over the file multiple times? If not why not just handle all the lines and then at the end you just erase the file (removing all the handled lines)?

If you are iterating over it multiple times, then why will some lines remain?

I know very little about databases. Perhaps that is the way to go.

With a database, can you go immediately to a particular date and start there? I'd have no need to delete anything, then, because there would be no worry about going through already-used rows that are no longer needed. The program would simply store a date for when the previous trade ended as the next trade will begin the very next day.

To answer your questions, as it stands right now I'm iterating through this file multiple times. I consider that a waste, though, since each row needs to be processed only once. That's why I raised the question in the first place.

As it stands, all the lines remain because I don't know of a way to delete them.

I did think about doing this as a dataframe. I could then keep track of the row number (index, which corresponds to date), but I got the sense using .iloc[] as an indexer might be slow/cumbersome because the program would still have to go through the file until it reached the stated index line. Would a database have any advantage in this respect?
Reply
#9
(Jan-21-2022, 06:31 PM)Mark17 Wrote: To answer your questions, as it stands right now I'm iterating through this file multiple times. I consider that a waste, though, since each row needs to be processed only once. That's why I raised the question in the first place

Could you just delete the file after you finish? If you hit the end, haven't all the rows been processed?

Deleting everything is trivial compared to just deleting some portions.
Reply
#10
(Jan-21-2022, 09:00 PM)bowlofred Wrote:
(Jan-21-2022, 06:31 PM)Mark17 Wrote: To answer your questions, as it stands right now I'm iterating through this file multiple times. I consider that a waste, though, since each row needs to be processed only once. That's why I raised the question in the first place

Could you just delete the file after you finish? If you hit the end, haven't all the rows been processed?

Deleting everything is trivial compared to just deleting some portions.

The point of deleting along the way, in my mind, is to speed up the process. Imagine one row representing a day for 20 years and each backtrade lasting one calendar month. Imagine that the first trade is Jan 2002, second trade is Feb 2002, third trade is Mar 2002, etc. To run through each trade, the program must start from the top and iterate down until it finds the relevant dates. The farther into the backtest it goes, the more time is wasted. Ten years in, for example, begins around row 2500, which means the program has to iterate down 2500 rows until it hits the date it's looking for: 1/2/12 as the start of backtrade #121.

If I were able to delete rows along the way, then for every new trade the program would start near row 1 (maybe row 2 given a header) because all the rows just processed for the previous trade have been deleted since they are no longer needed for the current backtest. I thought this would represent a big savings of time and computational resources.

I don't care what happens to the file once the backtest is complete. When I run the program again for a new backtest, though, it will start by reading in the original datafile from disk. The original datafile from disk must be preserved; the read file in memory may be deleted. This is my thinking, anyway, as a beginner.

Feel free to correct any and all misconceptions. :)
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  output provide the filename along with the input file processed. arjunaram 1 933 Apr-13-2023, 08:15 PM
Last Post: menator01
  Delete multiple lines from txt file Lky 6 2,284 Jul-10-2022, 12:09 PM
Last Post: jefsummers
  Find and delete above a certain line in text file cubangt 12 3,459 Mar-18-2022, 07:49 PM
Last Post: snippsat
  delete a file works but with error Leon79 4 2,920 Jul-14-2020, 06:51 AM
Last Post: snippsat
  Find, delete and add text into pdf file a_shvechkov 2 5,925 Jul-08-2020, 10:50 AM
Last Post: a_shvechkov
  Delete all contents of a file from the fifth line? PythonNPC 1 1,901 Apr-18-2020, 09:16 AM
Last Post: buran
  code not writing to projNameVal portion of code. umkc1 1 1,671 Feb-05-2020, 10:05 PM
Last Post: Larz60+
  Can't seem to figure out how to delete several lines from a text file Cosmosso 9 4,119 Dec-10-2019, 11:09 PM
Last Post: Cosmosso
  delete file with handling 3Pinter 1 2,091 Oct-17-2019, 04:06 PM
Last Post: 3Pinter
  delete a file after closing it mcgrim 1 2,097 May-14-2019, 08:16 PM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020