Python Forum
regex findall() returning weird result
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
regex findall() returning weird result
#1
EDIT: You can likely just skip to the "update" at the bottom of this post.

What I am trying to do is test out regular expression usage to find a phone number in a string (accounting for multiple written formats). Here is my program:
import re

phoneNumRegex = re.compile(r'(\+\d{1,3}( )?)?(\d\d\d|\(\d\d\d\))(-| )\d\d\d-\d\d\d\d')
mo = phoneNumRegex.search('my numbers are +1 (515) 444-4446, 333-234-8655.')
moo = phoneNumRegex.findall('my numbers are +1 (515) 444-4446, 333-234-8655.')
print(mo.group())
print(moo)
Output:
+1 (515) 444-4446 [('+1 ', ' ', '(515)', ' '), ('', '', '333', '-')]
So search() is working perfectly. It finds and returns the phone number. But findall() is far from the desired result. Why is it doing this? I would expect it to have the same behaviour, but this time return both phone numbers in a list.

When I greatly simplify the expression in re.compile findall() seems to work:
phoneNumRegex = re.compile(r'(\d\d\d)-(\d\d\d)-(\d\d\d\d)')
print(phoneNumRegex.findall('Cell: 415-555-9999 Work: 212-555-0000'))
Output:
[('415', '555', '9999'), ('212', '555', '0000')]
So what is happening here? What is the difference? I must be misunderstanding how findall() actually works.
____________________________________________________

UPDATE: I've been looking through some threads of people having a similar issue on this forum, and one of the fixes was to put brackets around the parts you actually want to keep. So here is my updated compile line:
phoneNumRegex = re.compile(r'(\+\d{1,3} ?)?(\d\d\d|\(\d\d\d\))(-| )(\d\d\d)-(\d\d\d\d)')
Output:
[('+1 ', '(515)', ' ', '444', '4446'), ('', '333', '-', '234', '8655')]
So it does capture the ending section of the phone number now, and I've eliminated a few of the returned dashes and spaces. But I still have some extra junk being returned, and I'm not sure how to alter my expression to be able to handle all those exceptions and variability in what it might encounter without putting those sections into brackets...

The sections I'm still trying to eliminate from the output are:
- the (-| ) section
- the brackets around the phone number's area code
- the space after +1
Reply
#2
Quote:So search() is working perfectly. It finds and returns the phone number. But findall() is far from the desired result. Why is it doing this?
when use () then making a capturing group and then will findall returns only the capturing groups.
>>> s = 'Red car 99' 
>>> re.findall(r'\w.*\s\d{2}', s)
['Red car 99']
# Add a group
>>> re.findall(r'\w.*\s(\d{2})', s)
['99']
So when add a group () findall only match that group,so this will be group 1 with re.search.
>>> r = re.search(r'\w.*\s(\d{2})', s)
>>> r.group(0)
'Red car 99'
>>> r.group(1)
'99'
I would write it like this if need match the whole phone number.
import re

phone_numbers = [
    "123 numbers are +47 (515) 444-4446, 333-234-8655",
    "my 1277 numbers are +1 (515) 444-4446, 987-654-3210",
    "my numbers are +452 +8 (22) 444-4446, 555-888-7777"
]

pattern = r'\+[\d\s()-]+,\s?[\d-]+'
for phone_number in phone_numbers:
    matches = re.findall(pattern, phone_number)
    for match in matches:
        print(match)
Output:
+47 (515) 444-4446, 333-234-8655 +1 (515) 444-4446, 987-654-3210 +8 (22) 444-4446, 555-888-7777
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Python: re.findall to find multiple instances don't work but search worked Secret 1 1,231 Aug-30-2022, 08:40 PM
Last Post: deanhystad
  regex.findall that won't match anything xiaobai97 1 2,036 Sep-24-2020, 02:02 PM
Last Post: DeaD_EyE
  Regex findall() NewBeie 2 4,304 Jul-10-2020, 12:19 PM
Last Post: DeaD_EyE
  re.findall HELP!!! only returns None Rusty 10 7,040 Jun-20-2020, 12:13 AM
Last Post: Rusty
  The "FindAll" Error BadWhite 6 4,396 Apr-11-2020, 05:59 PM
Last Post: snippsat
  Beginner question: lxml's findall in an xml namespace aecklers 0 2,927 Jan-22-2020, 10:53 AM
Last Post: aecklers
  weird result trying to remove numbers from a list Exsul 6 3,469 Aug-27-2019, 05:10 AM
Last Post: perfringo
  Issue with re.findall alinaveed786 8 4,872 Oct-20-2018, 09:28 AM
Last Post: volcano63
  Prompting user for number, reading number, squaring it and returning result JHPythonLearner 5 3,024 Sep-13-2018, 04:05 PM
Last Post: gruntfutuk
  [Regex] Findall returns wrong number of hits Winfried 8 5,826 Aug-23-2018, 02:21 PM
Last Post: Winfried

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020