Python Forum
seeking simple|clean|pythonic way to capture {1,} numeric clusters - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: seeking simple|clean|pythonic way to capture {1,} numeric clusters (/thread-33876.html)



seeking simple|clean|pythonic way to capture {1,} numeric clusters - NetPCDoc - Jun-05-2021

seeking simple|clean|pythonic way (expression | pattern) to capture {1,} numeric clusters (i.e. "\d+") …
… associated with a single "key_phrase"?

(I am having great difficulty with finding proper search terms for any posted examples.)

goal:
  • 'Match 1 0-c "key_phrase [22 1 333 ...]"'
  • 'Group <int> 12-14 "22"'
  • 'Group <int> 15-16 "1"'
  • 'Group <int> 17-20 "333"'
  • 'Group <int> a-b "\d+"'
    • where:
      • 20 < a <= b < c == b + 1

using (for testing purposes):
  • 'Python 2.7 "flavor"' at regex101
  • test_string = 'key_phrase [22 1 333]'
attempts (with partial success):
  • regex = 'r"key_phrase \[(\d+)+(?: |\])"gm'
    • results:
      • 'Match 1 0-15 "key_phrase [22 "'
      • 'Group 1 12-14 "22"'
  • regex = 'r"key_phrase \[(?:(?: )*(\d+))+\]"gm'
    • results:
      • 'Match 1 0-21 "key_phrase [22 1 333]"'
      • 'Group 1 17-20 "333"'
  • regex = 'r"key_phrase \[((?:(?: )*(?:\d+))+)\]"gm'
    • results:
      • 'Match 1 0-21 "key_phrase [22 1 333]"'
      • 'Group 1 12-20 "22 1 333"'
attempt (with complicated|messy|unpythonic, but successful?, result):
  • regex = 'r"key_phrase \[|(?: )*(\d+)|\]"gm'
    • results:
      • 'Match 1 0-12 "key_phrase ["'
      • 'Match 2 12-14 "22"'
      • 'Group 1 12-14 "22"'
      • 'Match 3 14-16 " 1"'
      • 'Group 1 15-16 "2"'
      • 'Match 4 16-20 " 333"'
      • 'Group 1 17-20 "333"'
      • 'Match 5 20-21 "]"'

Notes:
  • neither "(?:(?<=\[| )(\d+))+" nor "(\?:(\d+)(?= |\])+" are satisfactory as capture-group expressions.
  • '{1,} numeric clusters' is my way of stating :
    • want to ignore any "key_phrase []" potential matches, while capturing an unknown number of \d+ integers
    • (i.e. there could be only one or a multiple number - up to, at least, as many as can fit on a single line - \d+ groups).
  • the '22 1 333' in test_string was selected as a sample of what MIGHT be encountered.



RE: seeking simple|clean|pythonic way to capture {1,} numeric clusters - Gribouillis - Jun-06-2021

I don't think you can capture a variable number of groups in the same match. Each captured group correspond to a single group in the regex. For example the regex r"key_phrase \[(?:(?: )*(\d+))+\]" has visibly a single capturing group (\d+) hence there will be only a single group in the match.


RE: seeking simple|clean|pythonic way to capture {1,} numeric clusters - NetPCDoc - Jun-06-2021

(Jun-06-2021, 05:55 AM)Gribouillis Wrote: I don't think you can capture a variable number of groups in the same match. Each captured group correspond to a single group in the regex. For example the regex r"key_phrase \[(?:(?: )*(\d+))+\]" has visibly a single capturing group (\d+) hence there will be only a single group in the match.
While you might be technically correct, I expect python to do better than that!

r"key_phrase \[|(?: )*(\d+)|\]" has the same single visible capture group, and while in that brief form does include multiple matches, it also returns multiple instances of Group 1; and relies on the proper placement of the two | "alternative" designators to return ALL the desired captures.

Along side the fact that the expression r"key_phrase \[((?:(?: )*(?:\d+))+)\]" does eliminate the extra matches, while failing to separate out the individual \d+ instances … it seems to me, that if I REALLY knew what I was doing, these two expression contain most of the clues on how to proceed to a simpler | cleaner | more pythonic expression that does achieve the desired goal.

I am somewhat clueless as to how acceptable, pythonic speaking, multiple instances of the same group are? vs the incrementing group number that would result if one took the time to type in some thousand (or so) optional-capture-group-sub-expressions? And I still expect that python should have a short elegant way of coding an expression to make multiple captures in such situations.


RE: seeking simple|clean|pythonic way to capture {1,} numeric clusters - Gribouillis - Jun-06-2021

Tim Peters Wrote:Simple is better than complex
Rather than rewriting a 'better' re module, you could simply use two lines, the first regex could capture the whole substring "22 1 333" in a group, the second line would split this substring in a sequence of numbers. It is a very elegant solution to this problem.


RE: seeking simple|clean|pythonic way to capture {1,} numeric clusters - NetPCDoc - Jun-06-2021

(Jun-06-2021, 10:59 AM)Gribouillis Wrote: Rather than rewriting a 'better' re module, you could simply use two lines, the first regex could capture the whole substring "22 1 333" in a group, the second line would split this substring in a sequence of numbers. It is a very elegant solution to this problem.
YES!
I DO agree!

I believe this is what is on the other side of the brick wall I have been banging my head against!

To date - my preferred path to a solution:

Given:
  • regex = r"key_phrase \[((?:(?: )*(?:\d+))+)\]"gm
    • returns:
      • Match 1 0-21 key_phrase [22 1 333]
      • Group 1 12-20 22 1 333
  • thus:
    • ((?:(?: )*(?:\d+))+)
      • isolates the \d+ info (i.e. 22 1 333) from the key_phrase [] wrapper.
Any recommendations on how I might proceed?
  1. find a way to rewrite ((?:(?: )*(?:\d+))+) in place?
  2. find a way to reprocess Group 1?
  3. find a way to convert Group 1 to Match 2; and then process Match 2?
  4. any better ideas?

So, Gribouillis, if I am understanding you correctly - you are recommending that I proceed with option 2 (or maybe 3?) - from the above?

Which leaves me scratching my head on how to put two regex expressions into a single argument that can be passed to either re.search or re.replace?
… depending on whether: one just wants to find them? or one wants to actually replace them?

Any suggestions on 'search terms' for finding an example of how to make the above mentioned 'itch' go away?


RE: seeking simple|clean|pythonic way to capture {1,} numeric clusters - Gribouillis - Jun-07-2021

>>> import re
>>> test_string = 'key_phrase [22 1 333]'
>>> match = re.match(r'key_phrase \[([\d ]*)\]', test_string)
>>> result = []
>>> if match:
...     result = match.group(1).strip().split()
... 
>>> result
['22', '1', '333']



RE: seeking simple|clean|pythonic way to capture {1,} numeric clusters - NetPCDoc - Jun-10-2021

Thanks, Gribouillis. At first glance that looks very elegant …

But:
  • The re.match that you provided catches the null case test_string = 'key_phrase []', that I did not want to catch?
  • I was thinking more along the lines of either editor.replace(…) or editor.search(…)?
Then again:
  • Given match == 22 1 333,
  • I could use local_arg = match_obj.group(1).split() in a handler-function,
  • before printing, listing or substituting?
  • and then, for replacing, re-string the substitution result(s) into a return_arg?