Python Forum

Full Version: More elegant way to remove time from text lines.
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I have 10 text files and 10 mp3s

The text files have some lines with no time headers, but most lines look like this:

[11:29.53]Good morning, everyone.

I want all the time cues, like [11:29.53] gone, so I just have the text but no time cues.

I did it like this below, but I think it can be done more elegantly.

Any tips please?

#! /usr/bin/python3
# tidy up text copied from Topway English CD
import os

path = '/home/pedro/Documents/topway/'
files = os.listdir(path)
for f in files:
    print('Files are:', f)
    
file = input('What file are we looking for? Copy and paste 1 file here ... ')

textLoad = open(path + file)
textLoadData = textLoad.readlines()
textLoad.close()

newData = []

for line in textLoadData:
    if line[0] == '[':
        aLineCut = line[10:]
        newData.append(aLineCut)

preparedText = ''.join(newData)
newFile = open(path + file + '_timeless', 'w')
newFile.write(preparedText)
newFile.close()

print('ALL DONE! File saved as ' + path + file + '_timeless')
Some ways.
>>> s = '[11:29.53]Good morning, everyone'
>>> s.partition(']')[2]
'Good morning, everyone'
>>> import re
>>> 
>>> re.sub(r'\[.*]', '', s)
'Good morning, everyone'

textLoadData A song at right moment The PEP 8 Song🎵

f-string
print('ALL DONE! File saved as ' + path + file + '_timeless')
print(f'ALL DONE! File saved as {path}{file}_timeless')
Thanks, that's much better!

I never heard of .partition, but I suppose .split(']') would do the same job. Didn't think of that!
Why use Python at all? Command line tools like sed are made for this sort of thing. Regular expressions (aka "regex") as shown above are also worth knowing something about. An example with sed:

Output:
$ cat test [11:29.53]Good morning, everyone. $ sed -i 's/\[.*\]//' test $ cat test Good morning, everyone.
To explain a bit:

sed allows you to edit lines of text with various commands. The command that's being used here is s for "substitute". For each line in the file test, we substitute the thing between the first pair of / (that is, the string that matches the regular expression \[.*\] - the square brackets are meaningful in regex, hence the need to escape them) with the thing between the second pair (i.e. the empty string). The -i option, acts on the file in place (without it, the results are just printed to standard out, so you could redirect them to another file, if you wanted).

The Grymoire has a sed tutorial here.
(Apr-25-2021, 12:03 AM)Pedroski55 Wrote: [ -> ]I never heard of .partition, but I suppose .split(']') would do the same job. Didn't think of that!
Yes,just to show partition that more rare to use.
>>> s = '[11:29.53]Good morning, everyone'
>>> s.split(']')[-1]
'Good morning, everyone'
So a more elegant solution would could be like this.
See that there is no readlines() or close() used.
Iterate of file-object and with open() will close it automatically.
So now only line bye line is read into memory and not the whole file.
with open("in.txt") as f, open('out.txt', 'w') as f_out:
    for line in f:
        line = line.split(']')[-1]
        f_out.write(line)
import re

with open("in.txt") as f, open('out.txt', 'w') as f_out:
    for line in f:
        line = re.sub(r'\[.*]', '', line)
        f_out.write(line)
Thanks again!

@ Da Bishop:I have seen sed in action, but it seems so cryptic, only robots can understand it! I am not R2D2!

But thanks for the link to sed, I will see if I can use it sometime, somewhere. I already have trouble with re!

@snippsat: Thank you, that looks like I think it should look like, but I can't write!!

Very grateful to you both!!
There is Python friendly version of sed - sd. So it can be written as:

> sd -p '\[.*\]' '' test
-p flag is preview i.e. it will not change the file but you can see the result. If it is as expected flag can be omitted and actual change made