Posts: 283
Threads: 64
Joined: Apr 2019
Aug-04-2024, 08:57 PM
(This post was last modified: Aug-05-2024, 07:28 AM by paul18fr.)
Hi,
It's a general (funny?) question: how can we delete the 10 first lines of an ascii file, and save it, using a minimum of memory ( readlines is excluded accordingly)?
I was looking to "vi/vim" but i'm not sure it can be used in console mode (my trials failed): any suggestion?
My code:
import os, subprocess, sys
Path = os.getcwd()
AsciiFile = 'Ascii.txt'
DeleteLines = subprocess.Popen(':1,10d\nwq\n',
shell = True,
stdin = None,
stdout = subprocess.PIPE)
VimRun = subprocess.Popen(['vi ', Path + '/' + AsciiFile],
stdin = DeleteLines.stdout) Error: Vim: Warning: Output is not to a terminal
Vim: Warning: Input is not from a terminal
...
;mVim: Error reading input, exiting...\nVim: Finished
...
Thanks
Paul
Posts: 4,675
Threads: 73
Joined: Jan 2018
Aug-05-2024, 06:41 AM
(This post was last modified: Aug-05-2024, 06:45 AM by Gribouillis.)
Here is some untested code, using only python tools
from collections import deque
import itertools as itt
import os
import shutil
def remove_nlines(filename, nlines):
backup = filename + '.bak'
# move original file
os.rename(filename, backup)
with open(backup) as src, open(filename, 'w') as dst:
# consume nlines lines in source file
deque(itt.islice(src, None, nlines), maxlen=0)
# copy the rest of the file
shutil.copyfileobj(src, dst)
# remove backup file
os.remove(backup)
« We can solve any problem by introducing an extra level of indirection »
Posts: 283
Threads: 64
Joined: Apr 2019
Hi Gribouilli,
Thanks for you hint. I just found a way using vi/vim:
import os, subprocess, sys
Path = os.getcwd()
AsciiFile = 'Ascii.txt'
vimRun = subprocess.Popen('vi ' + Path + '/' + AsciiFile + ' +1,10d -c wq ', shell = True, stdin = None, stdout = None)
# +1,10d => equivalent to ":1,10d"
# "-c" <command> => the "c" of command to be executed
# 'stdout = None' avoid echoes Paul
Posts: 2,063
Threads: 9
Joined: May 2017
from pathlib import Path
from itertools import islice
def delete_first_lines(input_file, strip_lines=10):
source = Path(input_file)
# adding .new to the target and keeping existing suffixes
target = source.with_suffix("".join(source.suffixes + [".new"]))
# open source in binary read mode (no hassle with decoding errors)
# open target in binary write mode.
with source.open("rb") as fd_in, target.open("wb") as fd_out:
for line in islice(fd_in, strip_lines, None):
fd_out.write(line)
# will delete the original file
source.unlink()
# ranme the new file to old filename (without .new)
target.rename(source)
# example with test.f.txt
delete_first_lines("test.f1.txt") Tested on Windows. On Linux, it requires one lesser line code. On Windows, you can't overwrite a file by renaming another file, so the original file must be deleted first.
Posts: 7,221
Threads: 122
Joined: Sep 2016
So i rewrote my code from here.
Instead of showing first lines and last lines of choice,now it will delete those lines.
Example:
λ python del_lines.py contry.txt --head 5
Removed the first 5 lines and the last 0 lines from <contry.txt>
λ python del_lines.py contry.txt --tail 5
Removed the first 0 lines and the last 5 lines from <contry.txt> Using together:
G:\div_code\reader_env\delete_lines
λ python del_lines.py contry.txt --head 2 --tail 7
Removed the first 2 lines and the last 7 lines from <contry.txt> It's CLI applications using Typer.
# del_lines.py
import typer
from collections import deque
app = typer.Typer()
@app.command()
def headtail(filename: str, head: int = 0, tail: int = 0):
try:
with open(filename, 'r', encoding='utf-8', errors='ignore') as file:
lines = file.readlines()
# Calculate the remaining lines after removing head and tail
remaining_lines = lines[head:len(lines) - tail]
# Write the remaining lines back to the file
with open(filename, 'w', encoding='utf-8', errors='ignore') as file:
file.writelines(remaining_lines)
typer.echo(f"Removed the first {head} lines and the last {tail} lines from <{filename}>")
except FileNotFoundError:
typer.echo(f"Error: The file '{filename}' does not exist.", err=True)
except Exception as e:
typer.echo(f"An error occurred: {e}", err=True)
if __name__ == "__main__":
app()
Posts: 1,006
Threads: 141
Joined: Jul 2017
All seems awfully complicated!
You could try like this: skip 10 lines then write to another file
path2text = 'temp/brown_fox.txt'
savepath = 'temp/short_brown_fox.txt'
# skip the first 10 lines
with open(path2text, 'r') as infile, open(savepath, 'a') as outfile:
count = 0
for line in infile:
count+=1
print(line)
if count >= 11:
outfile.writelines(line)
Posts: 283
Threads: 64
Joined: Apr 2019
even if using vi is not a 100% pythonic way, it remains the simplest way in my mind, and fast (no loop). It can be directly inserted in my python code without using a bash script or another tool.
Everyone has ever used cp , mv in its code -i've ever experienced other common basic commands ( grep , sed , vi, wc , etc) to deal with huge files
Posts: 4,675
Threads: 73
Joined: Jan 2018
(Aug-07-2024, 06:51 PM)Pedroski55 Wrote: All seems awfully complicated! I made every effort to avoid writing a for loop in my solution, for performance. That's why obvious solutions are not necessarily the best ones. Simple code does not mean efficient code.
That said, we'd need measures to compare the performance of the various solutions.
Using an external Linux command is probably the most efficient on large files because these tools are highly optimized (but it is less portable than a pure Python script using the standard library).
« We can solve any problem by introducing an extra level of indirection »
|