Hi,
Using the '|' character within a Regex is giving me an undesirable result that I have been unable to avoid. For example, consider a 2-page file with the following text in each page:
Page 1:
111A111 #red.
Page 2:
AAA1AAA #green.
If I do:
But if I do:
Thanks and apologies in advance if the question is not properly formulated. I'm a beginner.
Using the '|' character within a Regex is giving me an undesirable result that I have been unable to avoid. For example, consider a 2-page file with the following text in each page:
Page 1:
111A111 #red.
Page 2:
AAA1AAA #green.
for i in range(0,2): text = doc.getPage(i).extract_text() color_re = re.compile(r'#\w+\.') color = color_re.findall(text) print(color)
Output:['red.']
['green.']
pattern_re = re.compile(r'(\w+\d+\w+)|(\d+\w+\d+)') pattern = pattern_re.findall(text) print(pattern)
Output:('', 'AAA1AAA')
('111A111', '')
If I do:
color =[item.strip('.') for item in color]I get rid of '.' so, all is good.
But if I do:
pattern = [item.strip(' , ') for item in pattern]I get the error:
Output:AttributeError: 'tuple' object has no attribute 'strip'
Is there a way to avoid this error? I need to get rid of the spaces and commas in 'pattern'. Thanks and apologies in advance if the question is not properly formulated. I'm a beginner.