Python Forum

Full Version: Find numbers using Regex
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2
Looks like you want to match one or more numbers followed by one or more letters. I also decided to catch one or more letters followed by one or more numbers
test_text = StringIO(
"""25 - not this number1
the cow just over the moon and the sun is in 1the sky
26 - not this number
5one day is soon and soon is near take 5529care, 30over and out
59 - not this number
The covers at near the back of the 59closet, and when found have them place on the each of the beds. However you see the pillow cases use the ones on the 9second shelve."""
)

pattern = re.compile(r"[0-9]+[a-zA-Z]+|[a-zA-Z]+[0-9]+")
for line in test_text:
    matches = re.findall(pattern, line)
    if matches:
        print(f"{line}Matches = {matches}\n")
    else:
        print(f"{line}No matches\n")
Output:
25 - not this number1 Matches = ['number1'] the cow just over the moon and the sun is in 1the sky Matches = ['1the'] 26 - not this number No matches 5one day is soon and soon is near take 5529care, 30over and out Matches = ['5one', '5529care', '30over'] 59 - not this number No matches The covers at near the back of the 59closet, and when found have them place on the each of the beds. However you see the pillow cases use the ones on the 9second shelve. Matches = ['59closet', '9second']
(Jul-25-2022, 04:26 AM)deanhystad Wrote: [ -> ]Looks like you want to match one or more numbers followed by one or more letters. I also decided to catch one or more letters followed by one or more numbers
test_text = StringIO(
"""25 - not this number1
the cow just over the moon and the sun is in 1the sky
26 - not this number
5one day is soon and soon is near take 5529care, 30over and out
59 - not this number
The covers at near the back of the 59closet, and when found have them place on the each of the beds. However you see the pillow cases use the ones on the 9second shelve."""
)

pattern = re.compile(r"[0-9]+[a-zA-Z]+|[a-zA-Z]+[0-9]+")
for line in test_text:
    matches = re.findall(pattern, line)
    if matches:
        print(f"{line}Matches = {matches}\n")
    else:
        print(f"{line}No matches\n")
Output:
25 - not this number1 Matches = ['number1'] the cow just over the moon and the sun is in 1the sky Matches = ['1the'] 26 - not this number No matches 5one day is soon and soon is near take 5529care, 30over and out Matches = ['5one', '5529care', '30over'] 59 - not this number No matches The covers at near the back of the 59closet, and when found have them place on the each of the beds. However you see the pillow cases use the ones on the 9second shelve. Matches = ['59closet', '9second']

Quote:Thanks for the help. At this time I am looking for a way to only find the numbers attached to the words. Instead of finding the whole thing can it be modified to only find the numbers only for example 5one, 5529care, 30over, etc? Thanks
(Jul-25-2022, 12:26 AM)Pedroski55 Wrote: [ -> ]Not too sure what you want to do now, maybe this??

import re

mylist = [15, '15', '1the', '5one', '5529care', '30over', '55more66', 25, '25']

pattern1 = re.compile(r'\D+') # matches non-numbers
# re.findall returns a list
for s in mylist:
    match = re.search(pattern1, str(s))    
    if match:     
        result = re.findall('[0-9]+', s)
        print(s)
        print(result)

Quote:Thanks. I tried to pass the whole list but for some reason I was receiving blank print statements. For reference and ease I have included a link https://regex101.com/r/fhDBF6/2 to the format of the data. Thanks
(Jul-24-2022, 11:03 PM)rob101 Wrote: [ -> ]
(Jul-24-2022, 08:32 PM)giddyhead Wrote: [ -> ]I see. thanks for the information and help Unfortunately I will not be able to modify it to suit however I appreciate what you have done. may I ask help with the existing code posted? Thanks

Sure; ask whatever you want to.

Quote:Copy. Thanks. Yeah looking for a way to get rid of the numbers attached to words using the code from web scrapping.
(Jul-25-2022, 05:27 AM)giddyhead Wrote: [ -> ]Yeah looking for a way to get rid of the numbers attached to words using the code from web scrapping.

Okay. That code from your web scrapping, as you call it, is not in a style that I work with, as you may gather from what I've posted.

As for getting rid of the numbers attached to words, that is precisely the objective of my posted script, given that said number is prefixed.

Maybe this is a 'Language barrier' issue. Do you fully understand English?

edit for p.s

I've updated the my code, as I was unhappy about some of the object names that I used (it was developed on-the-fly) and as I'll be adding this code to my notes, I've cleaned it up.

#!/usr/bin/python3

import re

def f_number(num):
    number = re.search('\d',num)
    if number:
        num_list = re.split('\d',num)
        return num_list[-1]

input_string = "" # put your text in this string object

string_list = input_string.split(' ')
p_string = ''

for get_word in range(len(string_list)):
    word = string_list[get_word]
    number = f_number(word)
    if not number:
        p_string += word+' '
    else:
        p_string += number+' '

print(p_string)
When you know what you want to replace it is easy to remove the digits. This uses a comprehension strip the numbers.:
from io import StringIO
import re


test_text = StringIO(
"""25 - not this number1
the cow just over the moon and the sun is in 1the sky
26 - not this number
5one day is soon and soon is near take 5529care, 30over and out
59 - not this number
The covers at near the back of the 59closet, and when found have them place on the each of the beds. However you see the pillow cases use the ones on the 9second shelve."""
)

pattern = re.compile(r"[0-9]+[a-zA-Z]+|[a-zA-Z]+[0-9]+")
for line in test_text:
    matches = re.findall(pattern, line)
    if matches:
        for match in matches:
            line = line.replace(match, "".join([c for c in match if c not in '0123456789']))
    print(line.rstrip())
Output:
25 - not this number the cow just over the moon and the sun is in the sky 26 - not this number one day is soon and soon is near take care, over and out 59 - not this number The covers at near the back of the closet, and when found have them place on the each of the beds. However you see the pillow cases use the ones on the second shelve.
And this uses another regex.
        for match in matches:
            line = line.replace(match, re.findall(stripper, match)[0])
But it is better to use re.sub(). Write a function that returns a digit-less version of the matching string. This function is the repl argument to the re.sub(patter, repl, string) call.
from io import StringIO
import re

test_text = StringIO(
"""25 - not this number1
the cow just over the moon and the sun is in 1the sky
26 - not this number
5one day is soon and soon is near take 5529care, 30over and out
59 - not this number
The covers at near the back of the 59closet, and when found have them place on the each of the beds. However you see the pillow cases use the ones on the 9second shelve."""
)

stripper = re.compile(r"[a-zA-Z]+")
finder = re.compile(r"[0-9]+[a-zA-Z]+|[a-zA-Z]+[0-9]+")

def strip_digits(match):
    """This is the repl function used by re.sub()"""
    return re.findall(stripper, match.group())[0]

for line in test_text:
    print(re.sub(finder, strip_digits, line).rstrip())
(Jul-25-2022, 05:15 AM)giddyhead Wrote: [ -> ]Instead of finding the whole thing can it be modified to only find the numbers only for example 5one, 5529care, 30over, etc? Thanks
If add a group () to deanhystad code then will get only numbers.
from io import StringIO
import re

test_text = StringIO(
"""25 - not this number1
the cow just over the moon and the sun is in 1the sky
26 - not this number
5one day is soon and soon is near take 5529care, 30over and out
59 - not this number
The covers at near the back of the 59closet, and when found have them place on the each of the beds. However you see the pillow cases use the ones on the 9second shelve."""
)

pattern = re.compile(r"([0-9]+)[a-zA-Z]+|[a-zA-Z]+([0-9]+)")
for line in test_text:
    matches = re.findall(pattern, line)
    if matches:
        print(f"{line}Matches = {matches}\n")
    else:
        print(f"{line}No matches\n") 
Output:
25 - not this number1 Matches = [('', '1')] the cow just over the moon and the sun is in 1the sky Matches = [('1', '')] 26 - not this number No matches 5one day is soon and soon is near take 5529care, 30over and out Matches = [('5', ''), ('5529', ''), ('30', '')] 59 - not this number No matches The covers at near the back of the 59closet, and when found have them place on the each of the beds. However you see the pillow cases use the ones on the 9second shelve.Matches = [('59', ''), ('9', '')]
Clean up.
>>> Matches = [('5', ''), ('5529', ''), ('30', '')]
>>> [i[0] for i in Matches]
['5', '5529', '30']
(Jul-25-2022, 07:32 PM)snippsat Wrote: [ -> ]
(Jul-25-2022, 05:15 AM)giddyhead Wrote: [ -> ]Instead of finding the whole thing can it be modified to only find the numbers only for example 5one, 5529care, 30over, etc? Thanks
If add a group () to deanhystad code then will get only numbers.
from io import StringIO
import re

test_text = StringIO(
"""25 - not this number1
the cow just over the moon and the sun is in 1the sky
26 - not this number
5one day is soon and soon is near take 5529care, 30over and out
59 - not this number
The covers at near the back of the 59closet, and when found have them place on the each of the beds. However you see the pillow cases use the ones on the 9second shelve."""
)

pattern = re.compile(r"([0-9]+)[a-zA-Z]+|[a-zA-Z]+([0-9]+)")
for line in test_text:
    matches = re.findall(pattern, line)
    if matches:
        print(f"{line}Matches = {matches}\n")
    else:
        print(f"{line}No matches\n") 
Output:
25 - not this number1 Matches = [('', '1')] the cow just over the moon and the sun is in 1the sky Matches = [('1', '')] 26 - not this number No matches 5one day is soon and soon is near take 5529care, 30over and out Matches = [('5', ''), ('5529', ''), ('30', '')] 59 - not this number No matches The covers at near the back of the 59closet, and when found have them place on the each of the beds. However you see the pillow cases use the ones on the 9second shelve.Matches = [('59', ''), ('9', '')]
Clean up.
>>> Matches = [('5', ''), ('5529', ''), ('30', '')]
>>> [i[0] for i in Matches]
['5', '5529', '30']

Got it Thank you!
Sorry did not get to post until now. Completed! Thank you all for your help, time and information.
Pages: 1 2