Find numbers using Regex

giddyhead · Jul-24-2022, 04:41 AM

Hello,

I have an regex issues as I am seeking to find only the numbers that are attached to words:
For example:
1the
5one
5529care
30over

The following regex

\d+([A-Za-z])

finds the number to include the first letter of the word. What modifications do I need so that it will find only the numbers attached to the words. Thanks

A sample text for reference:

25 - not this number

the cow just over the moon and the sun is in 1the sky

26 - not this number

5one day is soon and soon is near take 5529care, 30over and out

rob101 · (This post was last modified: Jul-24-2022, 06:26 AM by rob101.)

Hi.

Does this do what you want?

import re

string = '1the 5one 5529care 30over'

for i in range(len(string)):
    digit = re.search('[0-9]', string[i])
    if digit:
        print(f'Found digit {string[i]} at position {i}')

Output:Found digit 1 at position 0
Found digit 5 at position 5
Found digit 5 at position 10
Found digit 5 at position 11
Found digit 2 at position 12
Found digit 9 at position 13
Found digit 3 at position 19
Found digit 0 at position 20

{edit to remove my debug line of code}

Pedroski55 · Jul-24-2022, 07:29 AM

Maybe like this?

import re
mylist = ['1the', '5one', '5529care', '30over', '55more66']
# re.findall returns a list
for s in mylist:
    result = re.findall('[0-9]+', s)
    print(s)
    print(result)

giddyhead

(Jul-24-2022, 06:26 AM)rob101 Wrote: Hi.

Does this do what you want?

import re

string = '1the 5one 5529care 30over'

for i in range(len(string)):
    digit = re.search('[0-9]', string[i])
    if digit:
        print(f'Found digit {string[i]} at position {i}')

Output:Found digit 1 at position 0
Found digit 5 at position 5
Found digit 5 at position 10
Found digit 5 at position 11
Found digit 2 at position 12
Found digit 9 at position 13
Found digit 3 at position 19
Found digit 0 at position 20

{edit to remove my debug line of code}

Hi,

Thanks for the reply with information and help. Unfortunately it finds all the numbers within. The numbers in bold below is what I am looking to get rid of. The following is a sample but contains the format of the text in one of the lists which contain numbers throughout:
Hope it clarifies. Thanks in advance.

25 - not this number

the cow just over the moon and the sun is in 1the sky

26 - not this number

5one day is soon and soon is near take 5529care, 30over and out

59 - not this number

The covers at near the back of the 59closet, and when found have them place on the each of the beds. However you see the pillow cases use the ones on the 9second shelve.

(Jul-24-2022, 07:29 AM)Pedroski55 Wrote: Maybe like this?

import re
mylist = ['1the', '5one', '5529care', '30over', '55more66']
# re.findall returns a list
for s in mylist:
    result = re.findall('[0-9]+', s)
    print(s)
    print(result)

Hey,
Thanks for the help. Unfortunately due to numbers spread throughout each list it finds all the numbers. Looking to only find the numbers attached to words in bold. I have included a sample list for reference.
Thanks

25 - not this number

the cow just over the moon and the sun is in 1the sky

26 - not this number

5one day is soon and soon is near take 5529care, 30over and out

59 - not this number

The covers at near the back of the 59closet, and when found have them place on the each of the beds. However you see the pillow cases use the ones on the 9second shelve.

rob101

(Jul-24-2022, 12:11 PM)giddyhead Wrote: Hi,

Thanks for the reply with information and help. Unfortunately it finds all the numbers within. The numbers in bold below is what I am looking to get rid of. The following is a sample but contains the format of the text in one of the lists which contain numbers throughout:
Hope it clarifies. Thanks in advance.

25 - not this number

the cow just over the moon and the sun is in 1the sky

26 - not this number

5one day is soon and soon is near take 5529care, 30over and out

59 - not this number

The covers at near the back of the 59closet, and when found have them place on the each of the beds. However you see the pillow cases use the ones on the 9second shelve.

No worries. The point of my post (sorry that I wasn't clear on this) is that once you know the position of the digits, you can then use that as first step to building the rest of your script and produce whatever output you like, so more a proof of concept really.

What code do you have so far?

Maybe a better way would be to have two functions (one to find the digits and one to find everything else) and have them work together, in a 'for loop' or a 'while loop', with 'if/else' branches to process the results.

def fdigit(d):
    digit = re.search('\d',d)
    if digit:
        return 1

def fchar(c):
    char = re.search('\D',c)
    if char:
        return 1

I've no idea what you're skill level is. Is this something that you're going to be able to do, or will you need guidance?

edit: in fact one function will suffice: if the digit test fails, then there's no need for the other test.

giddyhead · Jul-24-2022, 06:22 PM

(Jul-24-2022, 02:52 PM)rob101 Wrote:
(Jul-24-2022, 12:11 PM)giddyhead Wrote: Hi,

Thanks for the reply with information and help. Unfortunately it finds all the numbers within. The numbers in bold below is what I am looking to get rid of. The following is a sample but contains the format of the text in one of the lists which contain numbers throughout:
Hope it clarifies. Thanks in advance.

25 - not this number

the cow just over the moon and the sun is in 1the sky

26 - not this number

5one day is soon and soon is near take 5529care, 30over and out

59 - not this number

The covers at near the back of the 59closet, and when found have them place on the each of the beds. However you see the pillow cases use the ones on the 9second shelve.

No worries. The point of my post (sorry that I wasn't clear on this) is that once you know the position of the digits, you can then use that as first step to building the rest of your script and produce whatever output you like, so more a proof of concept really.

What code do you have so far?

Maybe a better way would be to have two functions (one to find the digits and one to find everything else) and have them work together, in a 'for loop' or a 'while loop', with 'if/else' branches to process the results.
def fdigit(d):
    digit = re.search('\d',d)
    if digit:
        return 1

def fchar(c):
    char = re.search('\D',c)
    if char:
        return 1
I've no idea what you're skill level is. Is this something that you're going to be able to do, or will you need guidance?

edit: in fact one function will suffice: if the digit test fails, then there's no need for the other test.

Quote:Ahhh I see got it and copy. Thanks for information. I am still learning and in need of guidance. The following code is what I have that searches for the numbers attached to the words and join them back:
 for cf in soup.findAll('div', {'class':'flex flex-auto flex-col bg-white shadow-md'}):
        
        txt = re.sub('\[(.*?)\]','',cf.text)  #Subtrack anything between brackets  
        txt2 = re.split('\d+([A-Za-z])',txt) #Look for digits only attached to words ignore them
        cmp = ''.join(txt2) #Join back minus the ignore numbers
        print('cf text',txt)

rob101 · (This post was last modified: Jul-24-2022, 06:36 PM by rob101.)

I think I may have cracked this for you. If not, then I'm sure you can make any adjustments. If not, then I'm more than happy to help you.

Try it by coding in your text.

#!/usr/bin/python3

import re

def fdigit(d):
    digit = re.search('\d',d)
    if digit:
        lst_string = re.split('\d',d)
        return lst_string[-1]

string = "" # put your text in this string object

lst_string = string.split(' ')
pstring = ''

for get_word in range(len(lst_string)):
    word = lst_string[get_word]
    check = fdigit(word)
    if not check:
        pstring += word+' '
    else:
        pstring += check+' '

print(pstring)

I'm not 100% happy with my function name, now that it's doing a slightly different job to the one that it was conceived for, but it'll do.

{edit}

I Think we posted at almost the same time there. If this is any use to you, then cool; if not, maybe someone else can learn something from it. Smile

giddyhead · Jul-24-2022, 08:32 PM

(Jul-24-2022, 06:36 PM)rob101 Wrote: I think I may have cracked this for you. If not, then I'm sure you can make any adjustments. If not, then I'm more than happy to help you.

Try it by coding in your text.
#!/usr/bin/python3

import re

def fdigit(d):
    digit = re.search('\d',d)
    if digit:
        lst_string = re.split('\d',d)
        return lst_string[-1]

string = "" # put your text in this string object

lst_string = string.split(' ')
pstring = ''

for get_word in range(len(lst_string)):
    word = lst_string[get_word]
    check = fdigit(word)
    if not check:
        pstring += word+' '
    else:
        pstring += check+' '

print(pstring)
I'm not 100% happy with my function name, now that it's doing a slightly different job to the one that it was conceived for, but it'll do.

{edit}

I Think we posted at almost the same time there. If this is any use to you, then cool; if not, maybe someone else can learn something from it.

Quote:I see. thanks for the information and help Unfortunately I will not be able to modify it to suit however I appreciate what you have done. may I ask help with the existing code posted? Thanks

rob101 · Jul-24-2022, 11:03 PM

(Jul-24-2022, 08:32 PM)giddyhead Wrote: I see. thanks for the information and help Unfortunately I will not be able to modify it to suit however I appreciate what you have done. may I ask help with the existing code posted? Thanks

Sure; ask whatever you want to.

Pedroski55 · Jul-25-2022, 12:26 AM

Not too sure what you want to do now, maybe this??

import re

mylist = [15, '15', '1the', '5one', '5529care', '30over', '55more66', 25, '25']

pattern1 = re.compile(r'\D+') # matches non-numbers
# re.findall returns a list
for s in mylist:
    match = re.search(pattern1, str(s))    
    if match:     
        result = re.findall('[0-9]+', s)
        print(s)
        print(result)

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Regex to find triple characters	bfallert	14	3,060	May-16-2024, 04:02 PM Last Post: xMaxrayx
	find random numbers that are = to the first 2 number of a list.	Frankduc	23	7,166	Apr-05-2023, 07:36 PM Last Post: Frankduc
	Find and Replace numbers in String	giddyhead	2	3,065	Jul-17-2022, 06:22 PM Last Post: giddyhead
	find 2 largest equal numbers	Frankduc	13	5,876	Jan-11-2022, 07:10 PM Last Post: Frankduc
	Find and replace in files with regex and Python	Melcu54	0	2,365	Jun-03-2021, 09:33 AM Last Post: Melcu54
	Find and replace to capitalize with Regex	hermobot	2	3,287	Mar-21-2020, 12:30 PM Last Post: hermobot
	Python regex to get only numbers	tantony	6	5,422	Oct-09-2019, 11:53 PM Last Post: newbieAuggie2019
	Print Numbers starting at 1 vertically with separator for output numbers	Pleiades	3	5,065	May-09-2019, 12:19 PM Last Post: Pleiades
	How to find the sum of even numbers from entered N numbers?	Rajath	2	13,294	Sep-13-2017, 07:19 PM Last Post: nilamo

Find numbers using Regex

User Panel Messages

Announcements