Posts: 2
Threads: 1
Joined: Jun 2022
Jun-27-2022, 07:53 AM
(This post was last modified: Jun-27-2022, 07:53 AM by rektcol.)
Hello everyone,
I hope it will be comprehensible.
I have to extract text from file in repository. These files are like :
"
titi
titi
titi
yes
ok
ok
ok
no
totot
tototo
tot
"
How can i extract text behind yes and no ? I would like to have all these 'ok' ?
My output should be :
yes
ok
ok
ok
no
i'm able to read these file with os but i dont know how to extract correctly...
thanks for help !
Posts: 2,128
Threads: 11
Joined: May 2017
Jun-27-2022, 08:19 AM
(This post was last modified: Jun-27-2022, 08:19 AM by DeaD_EyE.)
Example with a generator:
- assign
False to start_found
- iterate line by line, which should be word for word
- if start was found, change
start_found to True
- yield element if
start_found is True
- return from generator, if end is the element. This will also leave the for-loop
- optional Exceptions:
- if the for-loop was finished, but
start_found is still False , then start-word was not found
- if the for-loop was finished and
start_found is True , then the end-word was not found
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
word_start = "yes"
word_end = "no"
words =
.splitlines()
def split(sequence, start, end):
start_found = False
for element in sequence:
if element = = word_start and not start_found:
start_found = True
elif element = = word_end and start_found:
return
elif start_found:
yield element
if start_found:
raise ValueError( f "'{end}' was not the last element in sequence" )
else :
raise ValueError( f "The start_word '{start}' was not found in sequence" )
oks = list (split(words, word_start, word_end))
print (oks)
|
Here the Version, which includes start-word and stop-word.
It has no big difference compared to the previous generator-function.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
word_start = "yes"
word_end = "no"
words =
.splitlines()
def split(sequence, start, end):
start_found = False
for element in sequence:
if element = = word_start and not start_found:
start_found = True
yield element
elif element = = word_end and start_found:
yield element
return
elif start_found:
yield element
if start_found:
raise ValueError( f "'{end}' was not the last element in sequence" )
else :
raise ValueError( f "The start_word '{start}' was not found in sequence" )
oks = list (split(words, word_start, word_end))
print (oks)
|
Posts: 4,802
Threads: 77
Joined: Jan 2018
You could use module itertools
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
import io
import itertools as itt
file = io.StringIO(
)
def takeuntil(pred, seq):
for elem in seq:
yield elem
if pred(elem):
return
seq = itt.dropwhile(( lambda line: line ! = 'yes\n' ), file )
seq = takeuntil(( lambda line: line = = 'no\n' ), seq)
print (' '.join(seq), end=' ')
|
Output: yes
ok
ok
ok
no
DeaD_EyE and tester_V like this post
Posts: 582
Threads: 1
Joined: Aug 2019
Posts: 4,802
Threads: 77
Joined: Jan 2018
Jun-27-2022, 08:21 PM
(This post was last modified: Jun-27-2022, 08:21 PM by Gribouillis.)
An even more functional version
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
from functools import partial
import io
from itertools import dropwhile
import operator
import sys
def equal(x):
return partial(operator.eq, x)
def not_equal(x):
return partial(operator.ne, x)
def takeuntil(pred, seq):
for elem in seq:
yield elem
if pred(elem):
return
file = io.StringIO(
)
seq = takeuntil(equal( 'no\n' ), dropwhile(not_equal( 'yes\n' ), file ))
sys.stdout.writelines(seq)
|
Output: yes
ok
ok
ok
no
tester_V and rektcol like this post
Posts: 2
Threads: 1
Joined: Jun 2022
Jun-28-2022, 07:38 AM
(This post was last modified: Jun-28-2022, 07:38 AM by rektcol.)
First, i learned many tricks and then i just debug myself !
Thanks a lot guys, these solutions are working well.
Imagine now if my input contains a number. for each file, the number is different.
for exemple the first file is :
"""
titi
12 yes
ok
no
eee
"""
and the other file could be like :
"""
titi
14 yes
ok
no
eee
"""
How can i specify that these number are changing. I mean, python is searching the exact file. Is it possible to indicate that some string could be not the same ?
Posts: 4,802
Threads: 77
Joined: Jan 2018
Jun-28-2022, 08:57 AM
(This post was last modified: Jun-28-2022, 08:57 AM by Gribouillis.)
For varying text, you can use regular expressions
1 2 3 4 5 |
import re
lines = dropwhile(re. compile (r '(?!^\d+\s+yes\s*$)' ).match, file )
lines = takeuntil(equal( 'no\n' ), lines)
sys.stdout.writelines(lines)
|
Output: 12 yes
ok
ok
ok
no
|