Python Forum

Full Version: Having trouble with regular expressions
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi I am a new python learner, having some difficulty with regular expressions

import re

x='trash bag 19th of July 1.456 3x times 20 juice'

y=re.findall('[0-9]+[0-9]',x)

print(y)
Here is the result: ['19','456','20']

I was doing some testing looking to understand regular expressions, I wrote the code and I thought it would not retrieve any numbers. When I run the code, why does it retrieve them? The way I wrote it, I thought that if a number is not followed by another number it wouldn't be retrieved.

Do I understand this correctly: "is 1 a number? yes. is 9 a number? yes. is "t" a number"? no -> no match

I also don't understand it retrieves the "19" but does not retrieve "1".

Sorry if this is a stupid question, I'm really new at this
It retrieves all strings that consist of:
  • One or more digits followed by,
  • exactly one digit.

The strings in your answer: '19', '456', and '20' are all the substrings in your original that meet those conditions.

't' isn't a digit, so 't' isn't part of the match string that is returned. But '19' satisfies all the conditions, so is returned.

'1' isn't a valid answer because your match requires a minimum of 2 digits that are adjacent to each other.
(Mar-16-2021, 01:40 AM)mikla Wrote: [ -> ]Do I understand this correctly: "is 1 a number? yes. is 9 a number? yes. is "t" a number"? no -> no match
Yes [0-9] matches a single digit in ranger 0 to 9.
(Mar-16-2021, 01:40 AM)mikla Wrote: [ -> ]I also don't understand it retrieves the "19" but does not retrieve "1".
As it have [0-9][0-9] it will match two digit and not one.
Also + make it matches one more digit or consecutive digits.
It's more normal to write [0-9] as \d.

Here some examples.
>>> import re
>>> 
>>> x = 'trash bag 19th of July 1.4569999912 3x times 20 juice'
>>> # The same regex as your and + is greedy
>>> re.findall(r'\d\d+', x)
['19', '4569999912', '20']
>>> 
>>> # All numbers
>>> re.findall(r'[\d\.\d]+', x)
['19', '1.4569999912', '3', '20']
>>> 
>>> # Only the float number
>>> re.findall(r'\d+\.\d+', x)
['1.4569999912']
(Mar-16-2021, 03:14 AM)snippsat Wrote: [ -> ]>>> # All numbers
>>> re.findall(r'[\d\.\d]+', x)
['19', '1.4569999912', '3', '20']

You shouldn't duplicate elements in a character class. It makes it seem like order matters.