Python Forum
seeking simple|clean|pythonic way to capture {1,} numeric clusters
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
seeking simple|clean|pythonic way to capture {1,} numeric clusters
#1
seeking simple|clean|pythonic way (expression | pattern) to capture {1,} numeric clusters (i.e. "\d+") …
… associated with a single "key_phrase"?

(I am having great difficulty with finding proper search terms for any posted examples.)

goal:
  • 'Match 1 0-c "key_phrase [22 1 333 ...]"'
  • 'Group <int> 12-14 "22"'
  • 'Group <int> 15-16 "1"'
  • 'Group <int> 17-20 "333"'
  • 'Group <int> a-b "\d+"'
    • where:
      • 20 < a <= b < c == b + 1

using (for testing purposes):
  • 'Python 2.7 "flavor"' at regex101
  • test_string = 'key_phrase [22 1 333]'
attempts (with partial success):
  • regex = 'r"key_phrase \[(\d+)+(?: |\])"gm'
    • results:
      • 'Match 1 0-15 "key_phrase [22 "'
      • 'Group 1 12-14 "22"'
  • regex = 'r"key_phrase \[(?:(?: )*(\d+))+\]"gm'
    • results:
      • 'Match 1 0-21 "key_phrase [22 1 333]"'
      • 'Group 1 17-20 "333"'
  • regex = 'r"key_phrase \[((?:(?: )*(?:\d+))+)\]"gm'
    • results:
      • 'Match 1 0-21 "key_phrase [22 1 333]"'
      • 'Group 1 12-20 "22 1 333"'
attempt (with complicated|messy|unpythonic, but successful?, result):
  • regex = 'r"key_phrase \[|(?: )*(\d+)|\]"gm'
    • results:
      • 'Match 1 0-12 "key_phrase ["'
      • 'Match 2 12-14 "22"'
      • 'Group 1 12-14 "22"'
      • 'Match 3 14-16 " 1"'
      • 'Group 1 15-16 "2"'
      • 'Match 4 16-20 " 333"'
      • 'Group 1 17-20 "333"'
      • 'Match 5 20-21 "]"'

Notes:
  • neither "(?:(?<=\[| )(\d+))+" nor "(\?:(\d+)(?= |\])+" are satisfactory as capture-group expressions.
  • '{1,} numeric clusters' is my way of stating :
    • want to ignore any "key_phrase []" potential matches, while capturing an unknown number of \d+ integers
    • (i.e. there could be only one or a multiple number - up to, at least, as many as can fit on a single line - \d+ groups).
  • the '22 1 333' in test_string was selected as a sample of what MIGHT be encountered.
Reply
#2
I don't think you can capture a variable number of groups in the same match. Each captured group correspond to a single group in the regex. For example the regex r"key_phrase \[(?:(?: )*(\d+))+\]" has visibly a single capturing group (\d+) hence there will be only a single group in the match.
Reply
#3
(Jun-06-2021, 05:55 AM)Gribouillis Wrote: I don't think you can capture a variable number of groups in the same match. Each captured group correspond to a single group in the regex. For example the regex r"key_phrase \[(?:(?: )*(\d+))+\]" has visibly a single capturing group (\d+) hence there will be only a single group in the match.
While you might be technically correct, I expect python to do better than that!

r"key_phrase \[|(?: )*(\d+)|\]" has the same single visible capture group, and while in that brief form does include multiple matches, it also returns multiple instances of Group 1; and relies on the proper placement of the two | "alternative" designators to return ALL the desired captures.

Along side the fact that the expression r"key_phrase \[((?:(?: )*(?:\d+))+)\]" does eliminate the extra matches, while failing to separate out the individual \d+ instances … it seems to me, that if I REALLY knew what I was doing, these two expression contain most of the clues on how to proceed to a simpler | cleaner | more pythonic expression that does achieve the desired goal.

I am somewhat clueless as to how acceptable, pythonic speaking, multiple instances of the same group are? vs the incrementing group number that would result if one took the time to type in some thousand (or so) optional-capture-group-sub-expressions? And I still expect that python should have a short elegant way of coding an expression to make multiple captures in such situations.
Reply
#4
Tim Peters Wrote:Simple is better than complex
Rather than rewriting a 'better' re module, you could simply use two lines, the first regex could capture the whole substring "22 1 333" in a group, the second line would split this substring in a sequence of numbers. It is a very elegant solution to this problem.
Reply
#5
(Jun-06-2021, 10:59 AM)Gribouillis Wrote: Rather than rewriting a 'better' re module, you could simply use two lines, the first regex could capture the whole substring "22 1 333" in a group, the second line would split this substring in a sequence of numbers. It is a very elegant solution to this problem.
YES!
I DO agree!

I believe this is what is on the other side of the brick wall I have been banging my head against!

To date - my preferred path to a solution:

Given:
  • regex = r"key_phrase \[((?:(?: )*(?:\d+))+)\]"gm
    • returns:
      • Match 1 0-21 key_phrase [22 1 333]
      • Group 1 12-20 22 1 333
  • thus:
    • ((?:(?: )*(?:\d+))+)
      • isolates the \d+ info (i.e. 22 1 333) from the key_phrase [] wrapper.
Any recommendations on how I might proceed?
  1. find a way to rewrite ((?:(?: )*(?:\d+))+) in place?
  2. find a way to reprocess Group 1?
  3. find a way to convert Group 1 to Match 2; and then process Match 2?
  4. any better ideas?

So, Gribouillis, if I am understanding you correctly - you are recommending that I proceed with option 2 (or maybe 3?) - from the above?

Which leaves me scratching my head on how to put two regex expressions into a single argument that can be passed to either re.search or re.replace?
… depending on whether: one just wants to find them? or one wants to actually replace them?

Any suggestions on 'search terms' for finding an example of how to make the above mentioned 'itch' go away?
Reply
#6
>>> import re
>>> test_string = 'key_phrase [22 1 333]'
>>> match = re.match(r'key_phrase \[([\d ]*)\]', test_string)
>>> result = []
>>> if match:
...     result = match.group(1).strip().split()
... 
>>> result
['22', '1', '333']
Reply
#7
Thanks, Gribouillis. At first glance that looks very elegant …

But:
  • The re.match that you provided catches the null case test_string = 'key_phrase []', that I did not want to catch?
  • I was thinking more along the lines of either editor.replace(…) or editor.search(…)?
Then again:
  • Given match == 22 1 333,
  • I could use local_arg = match_obj.group(1).split() in a handler-function,
  • before printing, listing or substituting?
  • and then, for replacing, re-string the substitution result(s) into a return_arg?
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Extract continuous numeric characters from a string in Python Robotguy 2 642 Jan-16-2021, 12:44 AM
Last Post: snippsat
  Clusters dawid294 6 810 Sep-17-2020, 02:18 PM
Last Post: Larz60+
  New user seeking help EdRaponi 2 711 Jun-23-2020, 12:03 PM
Last Post: EdRaponi
  How to calculate column mean and row skip non numeric and na Mekala 5 1,416 May-06-2020, 10:52 AM
Last Post: anbu23
  Alpha numeric element list search rhubarbpieguy 1 676 Apr-01-2020, 12:41 PM
Last Post: pyzyx3qwerty
  How to capture an error message TheHacker707 4 1,427 Feb-15-2020, 03:03 PM
Last Post: snippsat
  which is "better" (or more Pythonic)? Skaperen 2 796 Feb-01-2020, 03:10 PM
Last Post: Skaperen
  which is "better" (or more Pythonic)? Skaperen 7 1,333 Feb-01-2020, 03:51 AM
Last Post: Skaperen
  convert a character to numeric and back Skaperen 2 862 Jan-28-2020, 09:32 PM
Last Post: Skaperen
  How to clean session mqtt SayHiii 0 762 Dec-09-2019, 07:56 AM
Last Post: SayHiii

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020