seeking simple|clean|pythonic way to capture {1,} numeric clusters

NetPCDoc · (This post was last modified: Jun-05-2021, 11:19 PM by NetPCDoc.)

seeking simple|clean|pythonic way (expression | pattern) to capture {1,} numeric clusters (i.e. "\d+") …
… associated with a single "key_phrase"?

(I am having great difficulty with finding proper search terms for any posted examples.)

goal:

'Match 1 0-c "key_phrase [22 1 333 ...]"'
'Group <int> 12-14 "22"'
'Group <int> 15-16 "1"'
'Group <int> 17-20 "333"'
…
'Group <int> a-b "\d+"'
- where:
  - 20 < a <= b < c == b + 1

using (for testing purposes):

'Python 2.7 "flavor"' at regex101
test_string = 'key_phrase [22 1 333]'

attempts (with partial success):

regex = 'r"key_phrase \[(\d+)+(?: |\])"gm'
- results:
  - 'Match 1 0-15 "key_phrase [22 "'
  - 'Group 1 12-14 "22"'

regex = 'r"key_phrase \[(?:(?: )*(\d+))+\]"gm'
- results:
  - 'Match 1 0-21 "key_phrase [22 1 333]"'
  - 'Group 1 17-20 "333"'

regex = 'r"key_phrase \[((?:(?: )*(?:\d+))+)\]"gm'
- results:
  - 'Match 1 0-21 "key_phrase [22 1 333]"'
  - 'Group 1 12-20 "22 1 333"'

attempt (with complicated|messy|unpythonic, but successful?, result):

regex = 'r"key_phrase \[|(?: )*(\d+)|\]"gm'
- results:
  - 'Match 1 0-12 "key_phrase ["'
  - 'Match 2 12-14 "22"'
  - 'Group 1 12-14 "22"'
  - 'Match 3 14-16 " 1"'
  - 'Group 1 15-16 "2"'
  - 'Match 4 16-20 " 333"'
  - 'Group 1 17-20 "333"'
  - 'Match 5 20-21 "]"'

Notes:

neither "(?:(?<=\[| )(\d+))+" nor "(\?:(\d+)(?= |\])+" are satisfactory as capture-group expressions.

'{1,} numeric clusters' is my way of stating :
- want to ignore any "key_phrase []" potential matches, while capturing an unknown number of \d+ integers
- (i.e. there could be only one or a multiple number - up to, at least, as many as can fit on a single line - \d+ groups).

the '22 1 333' in test_string was selected as a sample of what MIGHT be encountered.

**Gribouillis** · Jun-06-2021, 05:55 AM

I don't think you can capture a variable number of groups in the same match. Each captured group correspond to a single group in the regex. For example the regex r"key_phrase \[(?:(?: )*(\d+))+\]" has visibly a single capturing group (\d+) hence there will be only a single group in the match.

NetPCDoc · Jun-06-2021, 09:07 AM

(Jun-06-2021, 05:55 AM)Gribouillis Wrote: I don't think you can capture a variable number of groups in the same match. Each captured group correspond to a single group in the regex. For example the regex r"key_phrase \[(?:(?: )*(\d+))+\]" has visibly a single capturing group (\d+) hence there will be only a single group in the match.

While you might be technically correct, I expect python to do better than that!

r"key_phrase \[|(?: )*(\d+)|\]" has the same single visible capture group, and while in that brief form does include multiple matches, it also returns multiple instances of Group 1; and relies on the proper placement of the two | "alternative" designators to return ALL the desired captures.

Along side the fact that the expression r"key_phrase \[((?:(?: )*(?:\d+))+)\]" does eliminate the extra matches, while failing to separate out the individual \d+ instances … it seems to me, that if I REALLY knew what I was doing, these two expression contain most of the clues on how to proceed to a simpler | cleaner | more pythonic expression that does achieve the desired goal.

I am somewhat clueless as to how acceptable, pythonic speaking, multiple instances of the same group are? vs the incrementing group number that would result if one took the time to type in some thousand (or so) optional-capture-group-sub-expressions? And I still expect that python should have a short elegant way of coding an expression to make multiple captures in such situations.

**Gribouillis** · (This post was last modified: Jun-06-2021, 11:01 AM by Gribouillis.)

Tim Peters Wrote:Simple is better than complex

Rather than rewriting a 'better' re module, you could simply use two lines, the first regex could capture the whole substring "22 1 333" in a group, the second line would split this substring in a sequence of numbers. It is a very elegant solution to this problem.

NetPCDoc · (This post was last modified: Jun-06-2021, 10:00 PM by NetPCDoc.)

(Jun-06-2021, 10:59 AM)Gribouillis Wrote: Rather than rewriting a 'better' re module, you could simply use two lines, the first regex could capture the whole substring "22 1 333" in a group, the second line would split this substring in a sequence of numbers. It is a very elegant solution to this problem.

YES!
I DO agree!

I believe this is what is on the other side of the brick wall I have been banging my head against!

To date - my preferred path to a solution:

Given:

regex = r"key_phrase \[((?:(?: )*(?:\d+))+)\]"gm
- returns:
  - Match 1 0-21 key_phrase [22 1 333]
  - Group 1 12-20 22 1 333
thus:
- ((?:(?: )*(?:\d+))+)
  - isolates the \d+ info (i.e. 22 1 333) from the key_phrase [] wrapper.

Any recommendations on how I might proceed?

find a way to rewrite ((?:(?: )*(?:\d+))+) in place?
find a way to reprocess Group 1?
find a way to convert Group 1 to Match 2; and then process Match 2?
any better ideas?

So, Gribouillis, if I am understanding you correctly - you are recommending that I proceed with option 2 (or maybe 3?) - from the above?

Which leaves me scratching my head on how to put two regex expressions into a single argument that can be passed to either re.search or re.replace?
… depending on whether: one just wants to find them? or one wants to actually replace them?

Any suggestions on 'search terms' for finding an example of how to make the above mentioned 'itch' go away?

**Gribouillis** · Jun-07-2021, 05:57 AM

>>> import re
>>> test_string = 'key_phrase [22 1 333]'
>>> match = re.match(r'key_phrase \[([\d ]*)\]', test_string)
>>> result = []
>>> if match:
...     result = match.group(1).strip().split()
... 
>>> result
['22', '1', '333']

NetPCDoc · Jun-10-2021, 05:14 PM

Thanks, Gribouillis. At first glance that looks very elegant …

But:

The re.match that you provided catches the null case test_string = 'key_phrase []', that I did not want to catch?
I was thinking more along the lines of either editor.replace(…) or editor.search(…)?

Then again:

Given match == 22 1 333,
I could use local_arg = match_obj.group(1).split() in a handler-function,
before printing, listing or substituting?
and then, for replacing, re-string the substitution result(s) into a return_arg?

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Numeric Enigma Machine	idev	9	2,963	Mar-29-2024, 06:15 PM Last Post: idev
	Can i clean this code ?	BSDevo	8	2,344	Oct-28-2023, 05:50 PM Last Post: BSDevo
	Pythonic from a C++ perspective	PyDan	2	1,434	Sep-18-2023, 11:39 AM Last Post: PyDan
	Clean Up Script	rotw121	2	1,834	May-25-2022, 03:24 PM Last Post: rotw121
	How to clean UART string	Joni_Engr	4	3,853	Dec-03-2021, 05:58 PM Last Post: deanhystad
	Numeric Anagrams - Count Occurances	monty024	2	2,247	Nov-13-2021, 05:05 PM Last Post: monty024
	How to get datetime from numeric format field	klllmmm	3	2,773	Nov-06-2021, 03:26 PM Last Post: snippsat
	Extract continuous numeric characters from a string in Python	Robotguy	2	3,720	Jan-16-2021, 12:44 AM Last Post: snippsat
	Clusters	dawid294	6	3,358	Sep-17-2020, 02:18 PM Last Post: Larz60+
	New user seeking help	EdRaponi	2	64,386	Jun-23-2020, 12:03 PM Last Post: EdRaponi

seeking simple|clean|pythonic way to capture {1,} numeric clusters

User Panel Messages

Announcements