Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Understanding Regex Groups
#1
Hello,

I'm trying to get a better understanding of regex capturing groups, because my python script is not executing as expected, based on what I understand of how regex works. I am using re.compile to set up the pattern, and then pattern.find

I am trying to capture the following phrases from user input to then send to At to set an At job:

1 minute
2 hours
4 days
etc.


for
r'(\b\d+\sminute|hour|day|week|month|year)'
1 minute works, but 1 hour does not. The capture with 1 hour is 'hour', omitting the preceding 2. But, if I change the capture group to something like
r'(\b\d+\s(minute|hour|day|week|month|year))'
, I end up with a tuple, which causes other problems. My understanding is that if you have a group within a group, then regex matching should only match the whole group, but after going around and around in circles with different little changes to the regex syntax, I am stumped. Am I stuck with a tuple that I then have to pull apart and deal with, or do I have to set up the capture so it makes a list of individual characthers, like [1, ' ', m, i, n, u, t, e] and then flatten the list? I am thinking there must be a more elegant way of capturing something so simple as 1 minute vs 2 hours without a whole bunch of post-processing workaround garbage. I should note, too, that I had an optional (s?) as a part of the syntax originally, but I've backburnered that smaller problem until I can get a better grip on regex capturing groups. Perhaps my issue is not in the pattern set up but in the .findall method???
Reply
#2
Can you make a standalone script that has a sample input and shows the capturing you're trying?
Reply
#3
(Dec-16-2020, 07:16 PM)matt_the_hall Wrote: for
r'(\b\d+\sminute|hour|day|week|month|year)'
1 minute works, but 1 hour does not. The capture with 1 hour is 'hour', omitting the preceding 2.

Right. If you are doing a match() rather than a search(), it will fail because the minute part doesn't match, and none of the alternatives match at the beginning.

Quote:But, if I change the capture group to something like
r'(\b\d+\s(minute|hour|day|week|month|year))'
, I end up with a tuple,

What is a tuple now? Changing the pattern should not change the type of the values. m.group() is always a string, m.group(1,2) is always a tuple, m.groups() is always a tuple. That's why I wanted to see your code.

Quote:which causes other problems. My understanding is that if you have a group within a group, then regex matching should only match the whole group,

Inside or outside don't matter. The leftmost open parenthesis starts the first group. The next open parenthesis starts the next group. In your example the first group will be the whole match.

import re
text = "1 hour"
pattern = r'(\b\d+\s(minute|hour|day|week|month|year))'

m = re.search(pattern, text)
# The outer group starts first, so it is m.group(1) or m.groups()[0]
# The inner group starts next, so it is m.group(2) or m.groups()[1]
print(type(m.group())) # none of these are tuples
print(f"outer (first) match is -> {m.groups()[0]} / {m.group(1)}")
print(f"inner (second) match is -> {m.groups()[1]} / {m.group(2)}")
Output:
<class 'str'> outer (first) match is -> 1 hour / 1 hour inner (second) match is -> hour / hour
However, if you just need the parentheses to group and not to capture, you can do that as well (so there's only a single group). For this case I wouldn't bother, but if you make the second group start with (?:, instead of just an open parenthesis, then it won't be a capture group.
Reply
#4
@bowlofred -- So sorry for the late reply, this last month or so has been a seriously tough one for me and my family.

In any case, your solution worked great. Using the group method was the correct way to go, instead of the findall method.

As for tuples, my understanding is that a tuple is a fixed size (2) list. Have I got that right? And how do most people pronounce that word? I say "toople," but a friend and actual programmer (I am just a hobbyist) pronounced it "tuhple" like Tupperware.

Yay, now my python script functions as expected and I can use it to remind myself of a thread. Thank you so much!
Reply
#5
(Jan-10-2021, 12:54 AM)matt_the_hall Wrote: As for tuples, my understanding is that a tuple is a fixed size (2) list.

Tuples are similar to lists (they're sequences, fixed length, etc.) The main difference is that lists can be changed after creation and tuples cannot. If you want a slightly different tuple, you have to copy the data into a new, immutable object. For someone reading the information from a tuple rather than creating them, there's almost no difference and you can treat them as a special kind of list.

The name comes from the common part of higher named multiples. Like you might say quintuple or sextuple, "n-tuple" or "tuple" is just a generalization of a sequence with n elements.
matt_the_hall likes this post
Reply
#6
(Jan-10-2021, 01:42 AM)bowlofred Wrote:
(Jan-10-2021, 12:54 AM)matt_the_hall Wrote: As for tuples, my understanding is that a tuple is a fixed size (2) list.

Tuples are similar to lists (they're sequences, fixed length, etc.) The main difference is that lists can be changed after creation and tuples cannot. If you want a slightly different tuple, you have to copy the data into a new, immutable object. For someone reading the information from a tuple rather than creating them, there's almost no difference and you can treat them as a special kind of list.

The name comes from the common part of higher named multiples. Like you might say quintuple or sextuple, "n-tuple" or "tuple" is just a generalization of a sequence with n elements.

Thank you very much! (so much to learn!).
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Ldap Search for finding user Groups ilknurg 1 1,713 Mar-11-2022, 12:10 PM
Last Post: DeaD_EyE
  Make Groups with the List Elements quest 2 1,936 Jul-11-2021, 09:58 AM
Last Post: perfringo
  Help understanding RegEx logic/output pyNewbee 4 2,229 Nov-15-2020, 02:21 AM
Last Post: pyNewbee
  How to solve equations, with groups of variables and or constraints? ThemePark 0 1,645 Oct-05-2020, 07:22 PM
Last Post: ThemePark
  Create homogeneous groups with Kmeans ? preliator 0 1,501 Sep-01-2020, 02:29 PM
Last Post: preliator
  Regex: finding if three groups have a value in them Daring_T 7 3,280 May-15-2020, 12:27 AM
Last Post: Daring_T
  How to take group of numbers summed in groups of 3... jaguare22 1 1,460 May-05-2020, 05:23 AM
Last Post: Yoriz
  Listing groups tharpa 2 2,541 Nov-26-2019, 07:25 AM
Last Post: DeaD_EyE
  groups attribute of a groupby object question smw10c 2 4,268 Apr-27-2017, 03:18 PM
Last Post: smw10c

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020