Having trouble with regular expressions

mikla · Mar-16-2021, 01:40 AM

Hi I am a new python learner, having some difficulty with regular expressions

import re

x='trash bag 19th of July 1.456 3x times 20 juice'

y=re.findall('[0-9]+[0-9]',x)

print(y)

Here is the result: ['19','456','20']

I was doing some testing looking to understand regular expressions, I wrote the code and I thought it would not retrieve any numbers. When I run the code, why does it retrieve them? The way I wrote it, I thought that if a number is not followed by another number it wouldn't be retrieved.

Do I understand this correctly: "is 1 a number? yes. is 9 a number? yes. is "t" a number"? no -> no match

I also don't understand it retrieves the "19" but does not retrieve "1".

Sorry if this is a stupid question, I'm really new at this

bowlofred · Mar-16-2021, 03:01 AM

It retrieves all strings that consist of:

One or more digits followed by,
exactly one digit.

The strings in your answer: '19', '456', and '20' are all the substrings in your original that meet those conditions.

't' isn't a digit, so 't' isn't part of the match string that is returned. But '19' satisfies all the conditions, so is returned.

'1' isn't a valid answer because your match requires a minimum of 2 digits that are adjacent to each other.

***snippsat*** · (This post was last modified: Mar-16-2021, 03:14 AM by snippsat.)

(Mar-16-2021, 01:40 AM)mikla Wrote: Do I understand this correctly: "is 1 a number? yes. is 9 a number? yes. is "t" a number"? no -> no match

Yes [0-9] matches a single digit in ranger 0 to 9.

(Mar-16-2021, 01:40 AM)mikla Wrote: I also don't understand it retrieves the "19" but does not retrieve "1".

As it have [0-9][0-9] it will match two digit and not one.
Also + make it matches one more digit or consecutive digits.
It's more normal to write [0-9] as \d.

Here some examples.

>>> import re
>>> 
>>> x = 'trash bag 19th of July 1.4569999912 3x times 20 juice'
>>> # The same regex as your and + is greedy
>>> re.findall(r'\d\d+', x)
['19', '4569999912', '20']
>>> 
>>> # All numbers
>>> re.findall(r'[\d\.\d]+', x)
['19', '1.4569999912', '3', '20']
>>> 
>>> # Only the float number
>>> re.findall(r'\d+\.\d+', x)
['1.4569999912']

bowlofred · Mar-16-2021, 03:44 PM

(Mar-16-2021, 03:14 AM)snippsat Wrote: >>> # All numbers
>>> re.findall(r'[\d\.\d]+', x)
['19', '1.4569999912', '3', '20']

You shouldn't duplicate elements in a character class. It makes it seem like order matters.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Search in a file using regular expressions	ADELE80	2	736	Dec-18-2024, 12:29 PM Last Post: ADELE80
	Use or raw string on regular expressions	Zaya_pool	5	1,706	May-09-2024, 06:10 PM Last Post: Zaya_pool
	Do regular expressions still need raw strings?	bobmon	3	2,576	May-03-2024, 09:05 AM Last Post: rishika24
	Recursive regular expressions in Python	risu252	2	5,135	Jul-25-2023, 12:59 PM Last Post: risu252
	Regular Expressions - so close yet so far	bigpapa	5	2,208	May-03-2023, 08:18 AM Last Post: bowlofred
	Regular Expressions	pprod	4	4,503	Nov-13-2020, 07:45 AM Last Post: pprod
	Format phonenumbers - regular expressions	Viking	2	2,616	May-11-2020, 07:27 PM Last Post: Viking
	regular expressions in openpyxl. format	picnic	0	3,067	Mar-28-2020, 09:47 PM Last Post: picnic
	Unexpected (?) result with regular expressions	guraknugen	2	2,909	Jan-18-2020, 02:33 PM Last Post: guraknugen
	Strange output with regular expressions	newbieAuggie2019	1	2,479	Nov-04-2019, 07:06 PM Last Post: newbieAuggie2019

Having trouble with regular expressions

User Panel Messages

Announcements