Python Forum
'|' character within Regex returns a tuple?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
'|' character within Regex returns a tuple?
#1
Hi,
Using the '|' character within a Regex is giving me an undesirable result that I have been unable to avoid. For example, consider a 2-page file with the following text in each page:

Page 1:
111A111 #red.

Page 2:
AAA1AAA #green.

for i in range(0,2):
    text = doc.getPage(i).extract_text()

    color_re = re.compile(r'#\w+\.')
    color = color_re.findall(text)
    print(color)
Output:
['red.'] ['green.']
    pattern_re = re.compile(r'(\w+\d+\w+)|(\d+\w+\d+)')
        pattern = pattern_re.findall(text)
        print(pattern)
Output:
('', 'AAA1AAA') ('111A111', '')

If I do:
color =[item.strip('.') for item in color]
I get rid of '.' so, all is good.

But if I do:
pattern = [item.strip(' , ') for item in pattern]
I get the error:
Output:
AttributeError: 'tuple' object has no attribute 'strip'
Is there a way to avoid this error? I need to get rid of the spaces and commas in 'pattern'.
Thanks and apologies in advance if the question is not properly formulated. I'm a beginner.
Reply
#2
Hi pprod,

You might be getting that error, due to the spacing between the quotes for the comma, after item.strip try :-

pattern = [item.strip(',') for item in pattern]
Best Regards

Eddie Winch
Reply
#3
(Feb-19-2021, 04:33 PM)eddywinch82 Wrote: Hi pprod,

You might be getting that error, due to the spacing between the quotes for the comma, after item.strip try :-

pattern = [item.strip(',') for item in pattern]
Best Regards

Eddie Winch

Thanks, Eddie. I've tried your suggestion but it still doesn't work. Please note that I updated the post and amended the output of print(pattern). Apologies I can't provide the full code and the file I'm using as it is confidential.
Reply
#4
For removing the spaces try :-

pattern = [item.replace(" ", "") for item in pattern]
I hope that works for you.

Regards

Eddie Winch
Reply
#5
For the removal of commas, maybe try :-

pattern =[item.strip(',') for item in pattern]
Regards

Eddie Winch
Reply
#6
And for the spaces removal, if the following doesn't work :-

pattern = [item.replace(" ", "") for item in pattern]
Try :-

pattern =[item.replace(" ", "") for item in pattern]
Eddie Winch ))
Reply
#7
Still no luck. I keep getting the error:
Output:
AttributeError: 'tuple' object has no attribute 'strip'
Output:
AttributeError: 'tuple' object has no attribute 'replace'
I suspect it has to do with the character '|' within the Regex as I don't get this error for the variable 'color'. Maybe if I convert the tuple to a list then I can use strip() or replace()? Thanks for your time.
Reply
#8
Oops. double posted.
pprod likes this post
Reply
#9
When you set up a capturing regex, it numbers the capturing parentheses from left to right. So in a pattern like this:
>>> re.findall(r"(\w+\d+\w+)|(\d+\w+\d+)", "AAA1AAA")
[('AAA1AAA', '')]
you get a tuple with each element being the capture from each capture group.

It's not the pipe character, it's the parentheses.

Even if only one can match, they're still numbered and set from left to right. So the group you get back is a tuple with all the capture groups. To find what's in there, you can either loop through the elements of the tuple, or you can rewrite the regex so there's only one (or zero) capture groups.

If the parenthesis starts with ?:, then it won't be a capture group. That allows the pattern match to go back to "the entire pattern" and you don't have a tuple any longer.

>>> re.findall(r"(?:\w+\d+\w+)|(?:\d+\w+\d+)", "AAA1AAA")
['AAA1AAA']
>>> re.findall(r"(?:\w+\d+\w+)|(?:\d+\w+\d+)", "111A111")
['111A111']
pprod likes this post
Reply
#10
(Feb-19-2021, 05:15 PM)bowlofred Wrote: Oops. double posted.

Thanks, bowlofred. That worked fine. I don't think I'd figure that out any time soon.
Thanks guys!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  How returns behave in a function with multiple returns? khasbay 1 193 May-19-2024, 08:48 AM
Last Post: deanhystad
  Regex: a string does not starts and ends with the same character Melcu54 5 2,516 Jul-04-2021, 07:51 PM
Last Post: Melcu54
  [solved] unexpected character after line continuation character paul18fr 4 3,555 Jun-22-2021, 03:22 PM
Last Post: deanhystad
  code with no tuple gets : IndexError: tuple index out of range Aggam 4 2,919 Nov-04-2020, 11:26 AM
Last Post: Aggam
  SyntaxError: unexpected character after line continuation character siteshkumar 2 3,247 Jul-13-2020, 07:05 PM
Last Post: snippsat
  Regex won't replace character with line break Tomf96 2 2,617 Jan-12-2020, 12:14 PM
Last Post: Tomf96
  how can i handle "expected a character " type error , when I input no character vivekagrey 2 2,804 Jan-05-2020, 11:50 AM
Last Post: vivekagrey
  Substitution with regular expression returns hidden character SOH bajacri 2 3,895 Nov-17-2019, 03:38 AM
Last Post: bajacri
  How to get first line of a tuple and the third item in its tuple. Need Help, Anybody? SukhmeetSingh 5 3,316 May-21-2019, 11:39 AM
Last Post: avorane
  Replace changing string including uppercase character with lowercase character silfer 11 6,352 Mar-25-2019, 12:54 PM
Last Post: silfer

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020