Hello everyone,
I hope it will be comprehensible.
I have to extract text from file in repository. These files are like :
"
titi
titi
titi
yes
ok
ok
ok
no
totot
tototo
tot
"
How can i extract text behind yes and no ? I would like to have all these 'ok' ?
My output should be :
yes
ok
ok
ok
no
i'm able to read these file with os but i dont know how to extract correctly...
thanks for help !
![Smile Smile](https://python-forum.io/images/smilies/smile.png)
Example with a generator:
- assign
False
to start_found
- iterate line by line, which should be word for word
- if start was found, change
start_found
to True
- yield element if
start_found
is True
- return from generator, if end is the element. This will also leave the for-loop
- optional Exceptions:
- if the for-loop was finished, but
start_found
is still False
, then start-word was not found
- if the for-loop was finished and
start_found
is True
, then the end-word was not found
word_start = "yes"
word_end = "no"
words = """yes
ok
ok
ok
no""".splitlines()
def split(sequence, start, end):
start_found = False
for element in sequence:
if element == word_start and not start_found:
start_found = True
elif element == word_end and start_found:
return
# close generator
elif start_found:
yield element
# this point is reached, if the start or end was not found
if start_found:
# seen start, but no end
raise ValueError(f"'{end}' was not the last element in sequence")
else:
# seen no start in the whole sequence
raise ValueError(f"The start_word '{start}' was not found in sequence")
oks = list(split(words, word_start, word_end))
print(oks)
Here the Version, which includes start-word and stop-word.
It has no big difference compared to the previous generator-function.
word_start = "yes"
word_end = "no"
words = """yes
ok
ok
ok
no""".splitlines()
def split(sequence, start, end):
start_found = False
for element in sequence:
if element == word_start and not start_found:
start_found = True
yield element
elif element == word_end and start_found:
yield element # yield the word_end
return
# close generator
elif start_found:
yield element
# this point is reached, if the start or end was not found
if start_found:
# seen start, but no end
raise ValueError(f"'{end}' was not the last element in sequence")
else:
# seen no start in the whole sequence
raise ValueError(f"The start_word '{start}' was not found in sequence")
oks = list(split(words, word_start, word_end))
print(oks)
You could use module itertools
import io
import itertools as itt
file = io.StringIO("""\
titi
titi
titi
yes
ok
ok
ok
no
totot
tototo
tot
""")
def takeuntil(pred, seq):
for elem in seq:
yield elem
if pred(elem):
return
seq = itt.dropwhile((lambda line: line != 'yes\n'), file)
seq = takeuntil((lambda line: line == 'no\n'), seq)
print(''.join(seq), end='')
Output:
yes
ok
ok
ok
no
An even more functional version
from functools import partial
import io
from itertools import dropwhile
import operator
import sys
def equal(x):
return partial(operator.eq, x)
def not_equal(x):
return partial(operator.ne, x)
def takeuntil(pred, seq):
for elem in seq:
yield elem
if pred(elem):
return
file = io.StringIO("""\
titi
titi
titi
yes
ok
ok
ok
no
totot
tototo
tot
""")
seq = takeuntil(equal('no\n'), dropwhile(not_equal('yes\n'), file))
sys.stdout.writelines(seq)
Output:
yes
ok
ok
ok
no
First, i learned many tricks and then i just debug myself !
Thanks a lot guys, these solutions are working well.
Imagine now if my input contains a number. for each file, the number is different.
for exemple the first file is :
"""
titi
12 yes
ok
no
eee
"""
and the other file could be like :
"""
titi
14 yes
ok
no
eee
"""
How can i specify that these number are changing. I mean, python is searching the exact file. Is it possible to indicate that some string could be not the same ?
For varying text, you can use regular expressions
import re
lines = dropwhile(re.compile(r'(?!^\d+\s+yes\s*$)').match, file)
lines = takeuntil(equal('no\n'), lines)
sys.stdout.writelines(lines)
Output:
12 yes
ok
ok
ok
no