Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Find numbers using Regex
#11
Looks like you want to match one or more numbers followed by one or more letters. I also decided to catch one or more letters followed by one or more numbers
test_text = StringIO(
"""25 - not this number1
the cow just over the moon and the sun is in 1the sky
26 - not this number
5one day is soon and soon is near take 5529care, 30over and out
59 - not this number
The covers at near the back of the 59closet, and when found have them place on the each of the beds. However you see the pillow cases use the ones on the 9second shelve."""
)

pattern = re.compile(r"[0-9]+[a-zA-Z]+|[a-zA-Z]+[0-9]+")
for line in test_text:
    matches = re.findall(pattern, line)
    if matches:
        print(f"{line}Matches = {matches}\n")
    else:
        print(f"{line}No matches\n")
Output:
25 - not this number1 Matches = ['number1'] the cow just over the moon and the sun is in 1the sky Matches = ['1the'] 26 - not this number No matches 5one day is soon and soon is near take 5529care, 30over and out Matches = ['5one', '5529care', '30over'] 59 - not this number No matches The covers at near the back of the 59closet, and when found have them place on the each of the beds. However you see the pillow cases use the ones on the 9second shelve. Matches = ['59closet', '9second']
Reply
#12
(Jul-25-2022, 04:26 AM)deanhystad Wrote: Looks like you want to match one or more numbers followed by one or more letters. I also decided to catch one or more letters followed by one or more numbers
test_text = StringIO(
"""25 - not this number1
the cow just over the moon and the sun is in 1the sky
26 - not this number
5one day is soon and soon is near take 5529care, 30over and out
59 - not this number
The covers at near the back of the 59closet, and when found have them place on the each of the beds. However you see the pillow cases use the ones on the 9second shelve."""
)

pattern = re.compile(r"[0-9]+[a-zA-Z]+|[a-zA-Z]+[0-9]+")
for line in test_text:
    matches = re.findall(pattern, line)
    if matches:
        print(f"{line}Matches = {matches}\n")
    else:
        print(f"{line}No matches\n")
Output:
25 - not this number1 Matches = ['number1'] the cow just over the moon and the sun is in 1the sky Matches = ['1the'] 26 - not this number No matches 5one day is soon and soon is near take 5529care, 30over and out Matches = ['5one', '5529care', '30over'] 59 - not this number No matches The covers at near the back of the 59closet, and when found have them place on the each of the beds. However you see the pillow cases use the ones on the 9second shelve. Matches = ['59closet', '9second']

Quote:Thanks for the help. At this time I am looking for a way to only find the numbers attached to the words. Instead of finding the whole thing can it be modified to only find the numbers only for example 5one, 5529care, 30over, etc? Thanks
Reply
#13
(Jul-25-2022, 12:26 AM)Pedroski55 Wrote: Not too sure what you want to do now, maybe this??

import re

mylist = [15, '15', '1the', '5one', '5529care', '30over', '55more66', 25, '25']

pattern1 = re.compile(r'\D+') # matches non-numbers
# re.findall returns a list
for s in mylist:
    match = re.search(pattern1, str(s))    
    if match:     
        result = re.findall('[0-9]+', s)
        print(s)
        print(result)

Quote:Thanks. I tried to pass the whole list but for some reason I was receiving blank print statements. For reference and ease I have included a link https://regex101.com/r/fhDBF6/2 to the format of the data. Thanks
Reply
#14
(Jul-24-2022, 11:03 PM)rob101 Wrote:
(Jul-24-2022, 08:32 PM)giddyhead Wrote: I see. thanks for the information and help Unfortunately I will not be able to modify it to suit however I appreciate what you have done. may I ask help with the existing code posted? Thanks

Sure; ask whatever you want to.

Quote:Copy. Thanks. Yeah looking for a way to get rid of the numbers attached to words using the code from web scrapping.
Reply
#15
(Jul-25-2022, 05:27 AM)giddyhead Wrote: Yeah looking for a way to get rid of the numbers attached to words using the code from web scrapping.

Okay. That code from your web scrapping, as you call it, is not in a style that I work with, as you may gather from what I've posted.

As for getting rid of the numbers attached to words, that is precisely the objective of my posted script, given that said number is prefixed.

Maybe this is a 'Language barrier' issue. Do you fully understand English?

edit for p.s

I've updated the my code, as I was unhappy about some of the object names that I used (it was developed on-the-fly) and as I'll be adding this code to my notes, I've cleaned it up.

#!/usr/bin/python3

import re

def f_number(num):
    number = re.search('\d',num)
    if number:
        num_list = re.split('\d',num)
        return num_list[-1]

input_string = "" # put your text in this string object

string_list = input_string.split(' ')
p_string = ''

for get_word in range(len(string_list)):
    word = string_list[get_word]
    number = f_number(word)
    if not number:
        p_string += word+' '
    else:
        p_string += number+' '

print(p_string)
Sig:
>>> import this

The UNIX philosophy: "Do one thing, and do it well."

"The danger of computers becoming like humans is not as great as the danger of humans becoming like computers." :~ Konrad Zuse

"Everything should be made as simple as possible, but not simpler." :~ Albert Einstein
Reply
#16
When you know what you want to replace it is easy to remove the digits. This uses a comprehension strip the numbers.:
from io import StringIO
import re


test_text = StringIO(
"""25 - not this number1
the cow just over the moon and the sun is in 1the sky
26 - not this number
5one day is soon and soon is near take 5529care, 30over and out
59 - not this number
The covers at near the back of the 59closet, and when found have them place on the each of the beds. However you see the pillow cases use the ones on the 9second shelve."""
)

pattern = re.compile(r"[0-9]+[a-zA-Z]+|[a-zA-Z]+[0-9]+")
for line in test_text:
    matches = re.findall(pattern, line)
    if matches:
        for match in matches:
            line = line.replace(match, "".join([c for c in match if c not in '0123456789']))
    print(line.rstrip())
Output:
25 - not this number the cow just over the moon and the sun is in the sky 26 - not this number one day is soon and soon is near take care, over and out 59 - not this number The covers at near the back of the closet, and when found have them place on the each of the beds. However you see the pillow cases use the ones on the second shelve.
And this uses another regex.
        for match in matches:
            line = line.replace(match, re.findall(stripper, match)[0])
But it is better to use re.sub(). Write a function that returns a digit-less version of the matching string. This function is the repl argument to the re.sub(patter, repl, string) call.
from io import StringIO
import re

test_text = StringIO(
"""25 - not this number1
the cow just over the moon and the sun is in 1the sky
26 - not this number
5one day is soon and soon is near take 5529care, 30over and out
59 - not this number
The covers at near the back of the 59closet, and when found have them place on the each of the beds. However you see the pillow cases use the ones on the 9second shelve."""
)

stripper = re.compile(r"[a-zA-Z]+")
finder = re.compile(r"[0-9]+[a-zA-Z]+|[a-zA-Z]+[0-9]+")

def strip_digits(match):
    """This is the repl function used by re.sub()"""
    return re.findall(stripper, match.group())[0]

for line in test_text:
    print(re.sub(finder, strip_digits, line).rstrip())
rob101 likes this post
Reply
#17
(Jul-25-2022, 05:15 AM)giddyhead Wrote: Instead of finding the whole thing can it be modified to only find the numbers only for example 5one, 5529care, 30over, etc? Thanks
If add a group () to deanhystad code then will get only numbers.
from io import StringIO
import re

test_text = StringIO(
"""25 - not this number1
the cow just over the moon and the sun is in 1the sky
26 - not this number
5one day is soon and soon is near take 5529care, 30over and out
59 - not this number
The covers at near the back of the 59closet, and when found have them place on the each of the beds. However you see the pillow cases use the ones on the 9second shelve."""
)

pattern = re.compile(r"([0-9]+)[a-zA-Z]+|[a-zA-Z]+([0-9]+)")
for line in test_text:
    matches = re.findall(pattern, line)
    if matches:
        print(f"{line}Matches = {matches}\n")
    else:
        print(f"{line}No matches\n") 
Output:
25 - not this number1 Matches = [('', '1')] the cow just over the moon and the sun is in 1the sky Matches = [('1', '')] 26 - not this number No matches 5one day is soon and soon is near take 5529care, 30over and out Matches = [('5', ''), ('5529', ''), ('30', '')] 59 - not this number No matches The covers at near the back of the 59closet, and when found have them place on the each of the beds. However you see the pillow cases use the ones on the 9second shelve.Matches = [('59', ''), ('9', '')]
Clean up.
>>> Matches = [('5', ''), ('5529', ''), ('30', '')]
>>> [i[0] for i in Matches]
['5', '5529', '30']
Reply
#18
(Jul-25-2022, 07:32 PM)snippsat Wrote:
(Jul-25-2022, 05:15 AM)giddyhead Wrote: Instead of finding the whole thing can it be modified to only find the numbers only for example 5one, 5529care, 30over, etc? Thanks
If add a group () to deanhystad code then will get only numbers.
from io import StringIO
import re

test_text = StringIO(
"""25 - not this number1
the cow just over the moon and the sun is in 1the sky
26 - not this number
5one day is soon and soon is near take 5529care, 30over and out
59 - not this number
The covers at near the back of the 59closet, and when found have them place on the each of the beds. However you see the pillow cases use the ones on the 9second shelve."""
)

pattern = re.compile(r"([0-9]+)[a-zA-Z]+|[a-zA-Z]+([0-9]+)")
for line in test_text:
    matches = re.findall(pattern, line)
    if matches:
        print(f"{line}Matches = {matches}\n")
    else:
        print(f"{line}No matches\n") 
Output:
25 - not this number1 Matches = [('', '1')] the cow just over the moon and the sun is in 1the sky Matches = [('1', '')] 26 - not this number No matches 5one day is soon and soon is near take 5529care, 30over and out Matches = [('5', ''), ('5529', ''), ('30', '')] 59 - not this number No matches The covers at near the back of the 59closet, and when found have them place on the each of the beds. However you see the pillow cases use the ones on the 9second shelve.Matches = [('59', ''), ('9', '')]
Clean up.
>>> Matches = [('5', ''), ('5529', ''), ('30', '')]
>>> [i[0] for i in Matches]
['5', '5529', '30']

Got it Thank you!
Reply
#19
Sorry did not get to post until now. Completed! Thank you all for your help, time and information.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  find random numbers that are = to the first 2 number of a list. Frankduc 23 3,258 Apr-05-2023, 07:36 PM
Last Post: Frankduc
  Find and Replace numbers in String giddyhead 2 1,248 Jul-17-2022, 06:22 PM
Last Post: giddyhead
  find 2 largest equal numbers Frankduc 13 3,562 Jan-11-2022, 07:10 PM
Last Post: Frankduc
  Find and replace in files with regex and Python Melcu54 0 1,855 Jun-03-2021, 09:33 AM
Last Post: Melcu54
  Find and replace to capitalize with Regex hermobot 2 2,533 Mar-21-2020, 12:30 PM
Last Post: hermobot
  Python regex to get only numbers tantony 6 4,102 Oct-09-2019, 11:53 PM
Last Post: newbieAuggie2019
  Print Numbers starting at 1 vertically with separator for output numbers Pleiades 3 3,755 May-09-2019, 12:19 PM
Last Post: Pleiades
  How to find the sum of even numbers from entered N numbers? Rajath 2 12,395 Sep-13-2017, 07:19 PM
Last Post: nilamo

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020