Extract text

rektcol · (This post was last modified: Jun-27-2022, 07:53 AM by rektcol.)

Hello everyone,
I hope it will be comprehensible.

I have to extract text from file in repository. These files are like :

"
titi
titi
titi
yes
ok
ok
ok
no
totot
tototo
tot
"

How can i extract text behind yes and no ? I would like to have all these 'ok' ?

My output should be :

yes
ok
ok
ok
no

i'm able to read these file with os but i dont know how to extract correctly...

thanks for help ! Smile

DeaD_EyE · (This post was last modified: Jun-27-2022, 08:19 AM by DeaD_EyE.)

Example with a generator:

assign False to start_found
iterate line by line, which should be word for word
if start was found, change start_found to True
yield element if start_found is True
return from generator, if end is the element. This will also leave the for-loop
optional Exceptions:
- if the for-loop was finished, but start_found is still False, then start-word was not found
- if the for-loop was finished and start_found is True, then the end-word was not found

        
          
          
              
              word_start = "yes"
word_end = "no"
 
words = """yes
ok
ok
ok
no""".splitlines()
 
def split(sequence, start, end):
    start_found = False
 
    for element in sequence:
        if element == word_start and not start_found:
            start_found = True
        elif element == word_end and start_found:
            return
            # close generator
        elif start_found:
            yield element
 
    # this point is reached, if the start or end was not found
    if start_found:
        # seen start, but no end
        raise ValueError(f"'{end}' was not the last element in sequence")
    else:
        # seen no start in the whole sequence
        raise ValueError(f"The start_word '{start}' was not found in sequence")
 
oks = list(split(words, word_start, word_end))
print(oks)

            

        
      

Here the Version, which includes start-word and stop-word.
It has no big difference compared to the previous generator-function.

        
          
          
              
              word_start = "yes"
word_end = "no"
 
words = """yes
ok
ok
ok
no""".splitlines()
 
def split(sequence, start, end):
    start_found = False
 
    for element in sequence:
        if element == word_start and not start_found:
            start_found = True
            yield element
        elif element == word_end and start_found:
            yield element # yield the word_end
            return
            # close generator
        elif start_found:
            yield element
 
    # this point is reached, if the start or end was not found
    if start_found:
        # seen start, but no end
        raise ValueError(f"'{end}' was not the last element in sequence")
    else:
        # seen no start in the whole sequence
        raise ValueError(f"The start_word '{start}' was not found in sequence")
 
oks = list(split(words, word_start, word_end))
print(oks)

            

        
      

**Gribouillis** · Jun-27-2022, 08:48 AM

You could use module itertools

        
          
          
              
              import io
import itertools as itt
 
file = io.StringIO("""\
titi
titi
titi
yes
ok
ok
ok
no
totot
tototo
tot
""")
 
def takeuntil(pred, seq):
    for elem in seq:
        yield elem
        if pred(elem):
            return
 
seq = itt.dropwhile((lambda line: line != 'yes\n'), file)
seq = takeuntil((lambda line: line == 'no\n'), seq)
 
print(''.join(seq), end='')

            

        
      

Output:yes
ok
ok
ok
no

ibreeden · Jun-27-2022, 11:21 AM

Also have a look at this thread: Extract a string between 2 words from a text file.

**Gribouillis** · (This post was last modified: Jun-27-2022, 08:21 PM by Gribouillis.)

An even more functional version

        
          
          
              
              from functools import partial
import io
from itertools import dropwhile
import operator
import sys
 
def equal(x):
    return partial(operator.eq, x)
 
def not_equal(x):
    return partial(operator.ne, x)
 
def takeuntil(pred, seq):
    for elem in seq:
        yield elem
        if pred(elem):
            return
 
file = io.StringIO("""\
titi
titi
titi
yes
ok
ok
ok
no
totot
tototo
tot
""")
 
seq = takeuntil(equal('no\n'), dropwhile(not_equal('yes\n'), file))
 
sys.stdout.writelines(seq)

            

        
      

Output:yes
ok
ok
ok
no

rektcol · (This post was last modified: Jun-28-2022, 07:38 AM by rektcol.)

First, i learned many tricks and then i just debug myself !
Thanks a lot guys, these solutions are working well.

Imagine now if my input contains a number. for each file, the number is different.
for exemple the first file is :
"""
titi
12 yes
ok
no
eee
"""

and the other file could be like :

"""
titi
14 yes
ok
no
eee
"""

How can i specify that these number are changing. I mean, python is searching the exact file. Is it possible to indicate that some string could be not the same ?

**Gribouillis** · (This post was last modified: Jun-28-2022, 08:57 AM by Gribouillis.)

For varying text, you can use regular expressions

        
              import re
 
lines = dropwhile(re.compile(r'(?!^\d+\s+yes\s*$)').match, file)
lines = takeuntil(equal('no\n'), lines)
sys.stdout.writelines(lines)

Output:12 yes
ok
ok
ok
no

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Extract text from PDF	goryee	2	8,541	Jul-08-2024, 06:35 AM Last Post: Pedroski55
	extract only text strip byte array	Pir8Radio	7	7,214	Nov-29-2022, 10:24 PM Last Post: Pir8Radio
	Extract only certain text which are needed	Calli	26	13,436	Oct-10-2022, 03:58 PM Last Post: deanhystad
	Extract a string between 2 words from a text file	OscarBoots	2	2,799	Nov-02-2021, 08:50 AM Last Post: ibreeden
	Extract text based on postion and pattern	guddu_12	2	2,485	Sep-27-2021, 08:32 PM Last Post: guddu_12
	Extract specific sentences from text file	Bubly	3	5,169	May-31-2021, 06:55 PM Last Post: Larz60+
	extract color text from PDF	Maha	0	2,632	May-31-2021, 04:05 PM Last Post: Maha
	How to extract multiple text from a string?	chatguy	2	3,357	Feb-28-2021, 07:39 AM Last Post: bowlofred
	How to extract a single word from a text file	buttercup	7	7,285	Jul-22-2020, 04:45 AM Last Post: bowlofred
	How to extract specific rows and columns from a text file with Python	Farhan	0	4,176	Mar-25-2020, 09:18 PM Last Post: Farhan

Extract text

User Panel Messages

Announcements