Python Forum
Having trouble with regular expressions
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Having trouble with regular expressions
#1
Hi I am a new python learner, having some difficulty with regular expressions

import re

x='trash bag 19th of July 1.456 3x times 20 juice'

y=re.findall('[0-9]+[0-9]',x)

print(y)
Here is the result: ['19','456','20']

I was doing some testing looking to understand regular expressions, I wrote the code and I thought it would not retrieve any numbers. When I run the code, why does it retrieve them? The way I wrote it, I thought that if a number is not followed by another number it wouldn't be retrieved.

Do I understand this correctly: "is 1 a number? yes. is 9 a number? yes. is "t" a number"? no -> no match

I also don't understand it retrieves the "19" but does not retrieve "1".

Sorry if this is a stupid question, I'm really new at this
Reply
#2
It retrieves all strings that consist of:
  • One or more digits followed by,
  • exactly one digit.

The strings in your answer: '19', '456', and '20' are all the substrings in your original that meet those conditions.

't' isn't a digit, so 't' isn't part of the match string that is returned. But '19' satisfies all the conditions, so is returned.

'1' isn't a valid answer because your match requires a minimum of 2 digits that are adjacent to each other.
Reply
#3
(Mar-16-2021, 01:40 AM)mikla Wrote: Do I understand this correctly: "is 1 a number? yes. is 9 a number? yes. is "t" a number"? no -> no match
Yes [0-9] matches a single digit in ranger 0 to 9.
(Mar-16-2021, 01:40 AM)mikla Wrote: I also don't understand it retrieves the "19" but does not retrieve "1".
As it have [0-9][0-9] it will match two digit and not one.
Also + make it matches one more digit or consecutive digits.
It's more normal to write [0-9] as \d.

Here some examples.
>>> import re
>>> 
>>> x = 'trash bag 19th of July 1.4569999912 3x times 20 juice'
>>> # The same regex as your and + is greedy
>>> re.findall(r'\d\d+', x)
['19', '4569999912', '20']
>>> 
>>> # All numbers
>>> re.findall(r'[\d\.\d]+', x)
['19', '1.4569999912', '3', '20']
>>> 
>>> # Only the float number
>>> re.findall(r'\d+\.\d+', x)
['1.4569999912']
Reply
#4
(Mar-16-2021, 03:14 AM)snippsat Wrote: >>> # All numbers
>>> re.findall(r'[\d\.\d]+', x)
['19', '1.4569999912', '3', '20']

You shouldn't duplicate elements in a character class. It makes it seem like order matters.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Recursive regular expressions in Python risu252 2 1,262 Jul-25-2023, 12:59 PM
Last Post: risu252
Sad Regular Expressions - so close yet so far bigpapa 5 970 May-03-2023, 08:18 AM
Last Post: bowlofred
  Statements and Expressions Julie 1 1,642 Feb-26-2021, 05:19 PM
Last Post: nilamo
  Regular Expressions pprod 4 3,093 Nov-13-2020, 07:45 AM
Last Post: pprod
  Format phonenumbers - regular expressions Viking 2 1,910 May-11-2020, 07:27 PM
Last Post: Viking
  regular expressions in openpyxl. format picnic 0 2,488 Mar-28-2020, 09:47 PM
Last Post: picnic
  Unexpected (?) result with regular expressions guraknugen 2 2,231 Jan-18-2020, 02:33 PM
Last Post: guraknugen
  Strange output with regular expressions newbieAuggie2019 1 1,941 Nov-04-2019, 07:06 PM
Last Post: newbieAuggie2019
  Regular Expressions amitalable 4 2,780 Mar-14-2019, 04:31 PM
Last Post: DeaD_EyE
  Regular expressions help re.error: multiple repeat at position 23 JoseSalazar1 2 6,657 Sep-18-2018, 01:29 AM
Last Post: volcano63

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020