Python Forum
seeking simple|clean|pythonic way to capture {1,} numeric clusters
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
seeking simple|clean|pythonic way to capture {1,} numeric clusters
#1
seeking simple|clean|pythonic way (expression | pattern) to capture {1,} numeric clusters (i.e. "\d+") …
… associated with a single "key_phrase"?

(I am having great difficulty with finding proper search terms for any posted examples.)

goal:
  • 'Match 1 0-c "key_phrase [22 1 333 ...]"'
  • 'Group <int> 12-14 "22"'
  • 'Group <int> 15-16 "1"'
  • 'Group <int> 17-20 "333"'
  • 'Group <int> a-b "\d+"'
    • where:
      • 20 < a <= b < c == b + 1

using (for testing purposes):
  • 'Python 2.7 "flavor"' at regex101
  • test_string = 'key_phrase [22 1 333]'
attempts (with partial success):
  • regex = 'r"key_phrase \[(\d+)+(?: |\])"gm'
    • results:
      • 'Match 1 0-15 "key_phrase [22 "'
      • 'Group 1 12-14 "22"'
  • regex = 'r"key_phrase \[(?:(?: )*(\d+))+\]"gm'
    • results:
      • 'Match 1 0-21 "key_phrase [22 1 333]"'
      • 'Group 1 17-20 "333"'
  • regex = 'r"key_phrase \[((?:(?: )*(?:\d+))+)\]"gm'
    • results:
      • 'Match 1 0-21 "key_phrase [22 1 333]"'
      • 'Group 1 12-20 "22 1 333"'
attempt (with complicated|messy|unpythonic, but successful?, result):
  • regex = 'r"key_phrase \[|(?: )*(\d+)|\]"gm'
    • results:
      • 'Match 1 0-12 "key_phrase ["'
      • 'Match 2 12-14 "22"'
      • 'Group 1 12-14 "22"'
      • 'Match 3 14-16 " 1"'
      • 'Group 1 15-16 "2"'
      • 'Match 4 16-20 " 333"'
      • 'Group 1 17-20 "333"'
      • 'Match 5 20-21 "]"'

Notes:
  • neither "(?:(?<=\[| )(\d+))+" nor "(\?:(\d+)(?= |\])+" are satisfactory as capture-group expressions.
  • '{1,} numeric clusters' is my way of stating :
    • want to ignore any "key_phrase []" potential matches, while capturing an unknown number of \d+ integers
    • (i.e. there could be only one or a multiple number - up to, at least, as many as can fit on a single line - \d+ groups).
  • the '22 1 333' in test_string was selected as a sample of what MIGHT be encountered.
Reply
#2
I don't think you can capture a variable number of groups in the same match. Each captured group correspond to a single group in the regex. For example the regex r"key_phrase \[(?:(?: )*(\d+))+\]" has visibly a single capturing group (\d+) hence there will be only a single group in the match.
Reply
#3
(Jun-06-2021, 05:55 AM)Gribouillis Wrote: I don't think you can capture a variable number of groups in the same match. Each captured group correspond to a single group in the regex. For example the regex r"key_phrase \[(?:(?: )*(\d+))+\]" has visibly a single capturing group (\d+) hence there will be only a single group in the match.
While you might be technically correct, I expect python to do better than that!

r"key_phrase \[|(?: )*(\d+)|\]" has the same single visible capture group, and while in that brief form does include multiple matches, it also returns multiple instances of Group 1; and relies on the proper placement of the two | "alternative" designators to return ALL the desired captures.

Along side the fact that the expression r"key_phrase \[((?:(?: )*(?:\d+))+)\]" does eliminate the extra matches, while failing to separate out the individual \d+ instances … it seems to me, that if I REALLY knew what I was doing, these two expression contain most of the clues on how to proceed to a simpler | cleaner | more pythonic expression that does achieve the desired goal.

I am somewhat clueless as to how acceptable, pythonic speaking, multiple instances of the same group are? vs the incrementing group number that would result if one took the time to type in some thousand (or so) optional-capture-group-sub-expressions? And I still expect that python should have a short elegant way of coding an expression to make multiple captures in such situations.
Reply
#4
Tim Peters Wrote:Simple is better than complex
Rather than rewriting a 'better' re module, you could simply use two lines, the first regex could capture the whole substring "22 1 333" in a group, the second line would split this substring in a sequence of numbers. It is a very elegant solution to this problem.
Reply
#5
(Jun-06-2021, 10:59 AM)Gribouillis Wrote: Rather than rewriting a 'better' re module, you could simply use two lines, the first regex could capture the whole substring "22 1 333" in a group, the second line would split this substring in a sequence of numbers. It is a very elegant solution to this problem.
YES!
I DO agree!

I believe this is what is on the other side of the brick wall I have been banging my head against!

To date - my preferred path to a solution:

Given:
  • regex = r"key_phrase \[((?:(?: )*(?:\d+))+)\]"gm
    • returns:
      • Match 1 0-21 key_phrase [22 1 333]
      • Group 1 12-20 22 1 333
  • thus:
    • ((?:(?: )*(?:\d+))+)
      • isolates the \d+ info (i.e. 22 1 333) from the key_phrase [] wrapper.
Any recommendations on how I might proceed?
  1. find a way to rewrite ((?:(?: )*(?:\d+))+) in place?
  2. find a way to reprocess Group 1?
  3. find a way to convert Group 1 to Match 2; and then process Match 2?
  4. any better ideas?

So, Gribouillis, if I am understanding you correctly - you are recommending that I proceed with option 2 (or maybe 3?) - from the above?

Which leaves me scratching my head on how to put two regex expressions into a single argument that can be passed to either re.search or re.replace?
… depending on whether: one just wants to find them? or one wants to actually replace them?

Any suggestions on 'search terms' for finding an example of how to make the above mentioned 'itch' go away?
Reply
#6
>>> import re
>>> test_string = 'key_phrase [22 1 333]'
>>> match = re.match(r'key_phrase \[([\d ]*)\]', test_string)
>>> result = []
>>> if match:
...     result = match.group(1).strip().split()
... 
>>> result
['22', '1', '333']
Reply
#7
Thanks, Gribouillis. At first glance that looks very elegant …

But:
  • The re.match that you provided catches the null case test_string = 'key_phrase []', that I did not want to catch?
  • I was thinking more along the lines of either editor.replace(…) or editor.search(…)?
Then again:
  • Given match == 22 1 333,
  • I could use local_arg = match_obj.group(1).split() in a handler-function,
  • before printing, listing or substituting?
  • and then, for replacing, re-string the substitution result(s) into a return_arg?
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Numeric Enigma Machine idev 8 185 5 hours ago
Last Post: idev
  Can i clean this code ? BSDevo 8 849 Oct-28-2023, 05:50 PM
Last Post: BSDevo
  Clean Up Script rotw121 2 981 May-25-2022, 03:24 PM
Last Post: rotw121
  How to clean UART string Joni_Engr 4 2,411 Dec-03-2021, 05:58 PM
Last Post: deanhystad
Question Numeric Anagrams - Count Occurances monty024 2 1,475 Nov-13-2021, 05:05 PM
Last Post: monty024
  How to get datetime from numeric format field klllmmm 3 1,961 Nov-06-2021, 03:26 PM
Last Post: snippsat
  Extract continuous numeric characters from a string in Python Robotguy 2 2,580 Jan-16-2021, 12:44 AM
Last Post: snippsat
  Clusters dawid294 6 2,314 Sep-17-2020, 02:18 PM
Last Post: Larz60+
  New user seeking help EdRaponi 2 37,831 Jun-23-2020, 12:03 PM
Last Post: EdRaponi
  How to calculate column mean and row skip non numeric and na Mekala 5 4,831 May-06-2020, 10:52 AM
Last Post: anbu23

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020