Hi,
Using the '|' character within a Regex is giving me an undesirable result that I have been unable to avoid. For example, consider a 2-page file with the following text in each page:
Page 1:
111A111 #red.
Page 2:
AAA1AAA #green.
If I do:
I get rid of '.' so, all is good.
But if I do:
I get the error:
Thanks and apologies in advance if the question is not properly formulated. I'm a beginner.
Using the '|' character within a Regex is giving me an undesirable result that I have been unable to avoid. For example, consider a 2-page file with the following text in each page:
Page 1:
111A111 #red.
Page 2:
AAA1AAA #green.
1 2 3 4 5 6 |
for i in range ( 0 , 2 ): text = doc.getPage(i).extract_text() color_re = re. compile (r '#\w+\.' ) color = color_re.findall(text) print (color) |
Output:['red.']
['green.']
1 2 3 |
pattern_re = re. compile (r '(\w+\d+\w+)|(\d+\w+\d+)' ) pattern = pattern_re.findall(text) print (pattern) |
Output:('', 'AAA1AAA')
('111A111', '')
If I do:
1 |
color = [item.strip( '.' ) for item in color] |
But if I do:
1 |
pattern = [item.strip( ' , ' ) for item in pattern] |
Output:AttributeError: 'tuple' object has no attribute 'strip'
Is there a way to avoid this error? I need to get rid of the spaces and commas in 'pattern'. Thanks and apologies in advance if the question is not properly formulated. I'm a beginner.