Python Forum

Full Version: regular expression question
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
i want to match part of a string against a regular expression. normally i would just slice the string and pass that part to re. but my code is not in control of the calls to re methods and only the whole strings are used. the only thing my code gets to control is the pattern. i have a small string of about 10 to 20 characters and i want this to be compared in only the first 80 characters of each string being matched with this pattern. i think i could do this if i needed to skip the first 80 and match the rest. but i need to match the first 80 to this smaller string i put in the pattern, and ignore (.*) everything after 80 (variable amount). there may be strings to match that are shorter than 80 in which the whole string is to be matched. how can i make a regular expression do this?
Do you only have access to the built-in re class for matching, or can you use other things like the regex module?

With only re matching, this seems difficult. The lookbehind stuff could be useful, but it has to be fixed length, so you can't have it be all matches. Can you get the length of the string before the match? If so, you could build the pattern based on that. Here's an example. I asked it to look for matches only in the first 12 characters. It correctly finds the word inside that limit, and does not find anything that is straddling or beyond the limit. It does this by forcing the match to find at least n characters afterward, where n is calculated so that the match must be in the required portion.

So there is code here, but it doesn't modify the search string at all, just calculates the pattern.

import re

source_text = "A long string that has no evidence of the word apple early in the string"

match_before = 12
targets = ["long", "apple", "string"]

for target in targets:
    print(f"Looking for {target}...", end="")
    ignore_portion = max(len(source_text) - match_before, 0)
    pattern = re.compile(fr"{target}.{{{ignore_portion},}}$")
    if re.search(pattern, source_text):
        print(" Found")
    else:
        print(" No match")
Output:
Looking for long... Found Looking for apple... No match Looking for string... No match
the pattern will be matched against many strings of likely varying length. that and what i need to match could be several things together, like 2 decimal numbers in the midst of alphabetic names (if 1 number or 3 numbers then it is not a match). the unseen code will be doing all these match tries and processing the matched strings.

it would be nice if it let me pass a function to be called for each string but it wants me to pass a pattern for re. i'm thinking that i need to not use that program and just implement my own.
Skaperen Wrote:i'm thinking that i need to not use that program
Is it an open-source program? Do you have a link?
(Aug-22-2021, 10:05 AM)Gribouillis Wrote: [ -> ]Is it an open-source program? Do you have a link?

it's a pluggable proprietary API. i do get to see the source, but not in advance. i don't get to post the source.