re.sub not working

Ryokousha · Oct-02-2022, 04:18 AM

I have a simple mad libs genarator and template text. when I try to replace part of speech it's just do noting and return unchanged string. I don't know what's the problem I don't get any error message

import os
import random
import re

def generator(text:str) -> str:
    '''find needed words in text and ask user enter them, then print result'''
    parts_of_speech: list = re.findall('_+\s?\([^\)]+\)', text)
    words: dict = {}
    for part in parts_of_speech:
        text_to_user = part.replace('_', '').replace('(', '').replace(')', '').replace('\n', ' ')
        words[part] = input(f'Please input a(n) {text_to_user}\n')
        os.system('cls' if os.name == 'nt' else 'clear')
    
    for regex, replacable_world in words.items():
        text = re.sub(regex, replacable_world, text, 1)
    print(text)
    
def get_template(witch: str = None) -> str:
    '''just get a template from text file'''  
    if witch:
        return open(f'./templates/{witch}.txt', 'r').read()
    template: str = random.choice(os.listdir('./templates'))
    return open(f'./templates/{template}', 'r').read() 

def main() -> None:
    generator(get_template('The Monkey King!'))

if __name__ == '__main__':
    main()

"
The day I saw the Monkey King __________(verb) was one of the most

interesting days of the year.

After he did that, the king played chess on his brother's

__________(noun) and then combed his __________ (adjective) hair with a

comb made out of old fish bones. Later that same day, I saw the

Monkey King dance __________ (adverb)

in front of an audience of kangaroos and wombats.
"

**deanhystad** · Oct-02-2022, 04:50 AM

Some characters have special meanings in regular expressions, such as (). You obviously know this because your used backslashes to remove their special meaning in this:

re.findall('_+\s?\([^\)]+\)', text)

You could proces the regex strings to add in the backslashes, but it is easier to use str.replace().

Coricoco_fr · Oct-02-2022, 05:42 AM

Hello,

(Oct-02-2022, 04:50 AM)deanhystad Wrote: Some characters have special meanings in regular expressions, such as (). You obviously know this because your used backslashes to remove their special meaning in this:
re.findall('_+\s?\([^\)]+\)', text)
You could proces the regex strings to add in the backslashes, but it is easier to use str.replace().

We can use a rstring...

Pedroski55 · (This post was last modified: Oct-02-2022, 05:59 AM by Pedroski55.)

I make gapped texts for classroom use. Starting with a text, I make a list of words I want to extract and the line number, as the answer key.

The AK looks like this:

Quote:3,woman
4,dog
5,monkey

Just loop through the AK replacing the words, later the gapped text gets saved as .docx file, with the missing words in a table.

number = 1
for word in AKlist:
    # word looks like 3,woman\n
    splitword = word.split(',')
    # get rid of the comma to get the line number
    line = splitword[0]
    # minus 1 because the list string1 starts at 0
    linenum = int(line) -1
    print('line number is', linenum)        
    newword = splitword[1].replace('\n', '')
    print('newword is', newword)
    sentence = string1[linenum]
    print('sentence', linenum, sentence)        
    repl = f'{number}. ___________'
    # another re command
    # sentence = re.sub(r"\b{}\b".format(word), newword, sentences)
    sentence = re.sub(newword, repl, sentence, count=1)
    print(sentence)
    string1[linenum] = sentence
    number +=1

Ryokousha · Oct-02-2022, 07:04 AM

(Oct-02-2022, 04:50 AM)deanhystad Wrote: Some characters have special meanings in regular expressions, such as (). You obviously know this because your used backslashes to remove their special meaning in this:
re.findall('_+\s?\([^\)]+\)', text)
You could proces the regex strings to add in the backslashes, but it is easier to use str.replace().

Oh, that was a such stupid mistake. Thank you!

Ryokousha · Oct-02-2022, 07:06 AM

(Oct-02-2022, 05:42 AM)Coricoco_fr Wrote: Hello,
(Oct-02-2022, 04:50 AM)deanhystad Wrote: Some characters have special meanings in regular expressions, such as (). You obviously know this because your used backslashes to remove their special meaning in this:
re.findall('_+\s?\([^\)]+\)', text)
You could proces the regex strings to add in the backslashes, but it is easier to use str.replace().
We can use a rstring...

Hello, Do you know how can we format string to raw string? I've googled it but that not work

***snippsat*** · Oct-02-2022, 11:34 AM

(Oct-02-2022, 07:06 AM)Ryokousha Wrote: Hello, Do you know how can we format string to raw string? I've googled it but that not work

You add r(raw string) to the regex pattern,do this always as a habit or can get problems.
Example add car after new line(\n).

>>> import re
>>> 
>>> s = ' hello world\n'
>>> re.sub('(\n)', '\1car', s)
' hello world\x01car

So it fails,now add r and it's ok.

>>> import re
>>> 
>>> s = ' hello world\n'
>>> re.sub(r'(\n)', r'\1car', s)
' hello world\ncar'

From regex doc.

Quote:The solution is to use Python’s raw string notation for regular expression patterns;
backslashes are not handled in any special way in a string literal prefixed with 'r'.
So r"\n" is a two-character string containing '\' and 'n', while "\n" is a one-character string containing a newline.
Usually patterns will be expressed in Python code using this raw string notation.

**deanhystad** · (This post was last modified: Oct-02-2022, 03:00 PM by deanhystad.)

This is not an issue of raw strings. Raw strings are used to prevent backslashes from being treated as escape sequences. An r"string" has no effect on parentheses. As you know, your problem was that re.sub() interpreted the parentheses in your regex strings as grouping characters instead of literal parentheses.

When a python program is compiled (converted to bytecodes) all strings are converted to raw strings (escape sequences are replaced with associated character(s)). Strings with an "r" prefix skip the escape sequence processing as they are already in "raw" form. Since all strings are "raw" strings when your program runs, there is no way, or no need, to convert a str to a "raw" str.

You don't want two loops in generator(). Replace the words as they are encountered.

def generator(text:str) -> str:
    '''find needed words in text and ask user enter them, then print result'''
    for placeholder in re.findall(r'_+\s?\([^\)]+\)', text):  # good spot for a raw string
        word_type = placeholder.replace('_', '').replace('(', '').replace(')', '').replace('\n', ' ')
        text = text.replace(placeholder, input(f'Please input a(n) {word_type}\n'), 1)
    print(text)

This is a wonderful example of simpler is better. The dictionary in your solution limits your madlib to one noun, one verb, one adjective, etc. I suppose you could have verb2 and noun3, but that either looks clunky (Enter a verb2) or requires extra processing to remove the extra sequence number. It is much easier to replace the placeholders as they are encountered. With no dictionary you don't have to worry about uniqueness. Your madlib can have 10 nouns, because your program only knows about the next noun.

re.sub not working

User Panel Messages

Announcements