Python Forum

Full Version: Unexpected (?) result with regular expressions
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi!

I'm running Python3 in Manjaro Linux 18.1.5 and I want to ”emulate” sed. Here's an example in Bash using sed, this is the result I want:
Output:
~ $ echo "libreoffice-still 6.2.8-4" | sed -r 's/[^0-9]*([0-9\.\-]*)/LibreOffice \1/' LibreOffice 6.2.8-4
I thought this would work in Python, but obviously it doesn't:
import subprocess, re
print(re.sub('[^0-9]*([0-9\.\-]*)', r'LibreOffice \1', "libreoffice-still 6.2.8-4"))
Output:
LibreOffice 6.2.8-4LibreOffice
As you can see I used the exact same regular expressions, but the result is different and I don't understand why and what to do about it.
What am I doing wrong that makes ”LibreOffice” appear at the end of the result? I'm obviously missing something here…
Try this regex:
re.sub('libreoffice-still [^0-9]*([0-9\.\-]*)', r'LibreOffice \1', "libreoffice-still 6.2.8-4")
The regex begins now with libreoffice-still
I found that if I just add one or more or the original string at the beginning of the search string, I get my desired output:
import subprocess, re
print(re.sub('l[^0-9]*([0-9\.\-]*)', r'LibreOffice \1', 'libreoffice-still 6.2.8-4'))
Output:
LibreOffice 6.2.8-4
But why?

(Jan-18-2020, 02:33 PM)DeaD_EyE Wrote: [ -> ]Try this regex:
re.sub('libreoffice-still [^0-9]*([0-9\.\-]*)', r'LibreOffice \1', "libreoffice-still 6.2.8-4")
The regex begins now with libreoffice-still

Yes, I found that out too, but you were a bit faster. It also works when just adding the first letter:
re.sub('l[^0-9]*([0-9\.\-]*)', r'LibreOffice \1', "libreoffice-still 6.2.8-4")
I just don't see why. ”[^0-9]*” should take care of everything ahead of the first digit that appears, shouldn't it?
I read it as ”0 or more non-digits followed by 0 or more digits or periods or dashes and remember those digits, periods and dashes”, and that seems to be how sed reads it as well.