Python Forum

Hi!

I'm running Python3 in Manjaro Linux 18.1.5 and I want to ”emulate” sed. Here's an example in Bash using sed, this is the result I want:

Output:~ $ echo "libreoffice-still 6.2.8-4" | sed -r 's/[^0-9]*([0-9\.\-]*)/LibreOffice \1/'
LibreOffice 6.2.8-4

I thought this would work in Python, but obviously it doesn't:

import subprocess, re
print(re.sub('[^0-9]*([0-9\.\-]*)', r'LibreOffice \1', "libreoffice-still 6.2.8-4"))

Output:
LibreOffice 6.2.8-4LibreOffice

As you can see I used the exact same regular expressions, but the result is different and I don't understand why and what to do about it.
What am I doing wrong that makes ”LibreOffice” appear at the end of the result? I'm obviously missing something here…

Try this regex:

re.sub('libreoffice-still [^0-9]*([0-9\.\-]*)', r'LibreOffice \1', "libreoffice-still 6.2.8-4")

The regex begins now with libreoffice-still

I found that if I just add one or more or the original string at the beginning of the search string, I get my desired output:

import subprocess, re
print(re.sub('l[^0-9]*([0-9\.\-]*)', r'LibreOffice \1', 'libreoffice-still 6.2.8-4'))

Output:
LibreOffice 6.2.8-4

But why?

(Jan-18-2020, 02:33 PM)DeaD_EyE Wrote: [ -> ]Try this regex:
re.sub('libreoffice-still [^0-9]*([0-9\.\-]*)', r'LibreOffice \1', "libreoffice-still 6.2.8-4")
The regex begins now with libreoffice-still

Yes, I found that out too, but you were a bit faster. It also works when just adding the first letter:

re.sub('l[^0-9]*([0-9\.\-]*)', r'LibreOffice \1', "libreoffice-still 6.2.8-4")

I just don't see why. ”[^0-9]*” should take care of everything ahead of the first digit that appears, shouldn't it?
I read it as ”0 or more non-digits followed by 0 or more digits or periods or dashes and remember those digits, periods and dashes”, and that seems to be how sed reads it as well.

guraknugen

DeaD_EyE

guraknugen