Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Find numbers using Regex
#1
Hello,

I have an regex issues as I am seeking to find only the numbers that are attached to words:
For example:
1the
5one
5529care
30over

The following regex
\d+([A-Za-z])
finds the number to include the first letter of the word. What modifications do I need so that it will find only the numbers attached to the words. Thanks

A sample text for reference:

25 - not this number

the cow just over the moon and the sun is in 1the sky

26 - not this number

5one day is soon and soon is near take 5529care, 30over and out
Reply
#2
Hi.

Does this do what you want?

import re

string = '1the 5one 5529care 30over'

for i in range(len(string)):
    digit = re.search('[0-9]', string[i])
    if digit:
        print(f'Found digit {string[i]} at position {i}')
Output:
Found digit 1 at position 0 Found digit 5 at position 5 Found digit 5 at position 10 Found digit 5 at position 11 Found digit 2 at position 12 Found digit 9 at position 13 Found digit 3 at position 19 Found digit 0 at position 20
{edit to remove my debug line of code}
Sig:
>>> import this

The UNIX philosophy: "Do one thing, and do it well."

"The danger of computers becoming like humans is not as great as the danger of humans becoming like computers." :~ Konrad Zuse

"Everything should be made as simple as possible, but not simpler." :~ Albert Einstein
Reply
#3
Maybe like this?

import re
mylist = ['1the', '5one', '5529care', '30over', '55more66']
# re.findall returns a list
for s in mylist:
    result = re.findall('[0-9]+', s)
    print(s)
    print(result)
Reply
#4
(Jul-24-2022, 06:26 AM)rob101 Wrote: Hi.

Does this do what you want?

import re

string = '1the 5one 5529care 30over'

for i in range(len(string)):
    digit = re.search('[0-9]', string[i])
    if digit:
        print(f'Found digit {string[i]} at position {i}')
Output:
Found digit 1 at position 0 Found digit 5 at position 5 Found digit 5 at position 10 Found digit 5 at position 11 Found digit 2 at position 12 Found digit 9 at position 13 Found digit 3 at position 19 Found digit 0 at position 20
{edit to remove my debug line of code}

Hi,

Thanks for the reply with information and help. Unfortunately it finds all the numbers within. The numbers in bold below is what I am looking to get rid of. The following is a sample but contains the format of the text in one of the lists which contain numbers throughout:
Hope it clarifies. Thanks in advance.

25 - not this number

the cow just over the moon and the sun is in 1the sky

26 - not this number

5one day is soon and soon is near take 5529care, 30over and out

59 - not this number

The covers at near the back of the 59closet, and when found have them place on the each of the beds. However you see the pillow cases use the ones on the 9second shelve.
(Jul-24-2022, 07:29 AM)Pedroski55 Wrote: Maybe like this?

import re
mylist = ['1the', '5one', '5529care', '30over', '55more66']
# re.findall returns a list
for s in mylist:
    result = re.findall('[0-9]+', s)
    print(s)
    print(result)

Hey,
Thanks for the help. Unfortunately due to numbers spread throughout each list it finds all the numbers. Looking to only find the numbers attached to words in bold. I have included a sample list for reference.
Thanks

25 - not this number

the cow just over the moon and the sun is in 1the sky

26 - not this number

5one day is soon and soon is near take 5529care, 30over and out

59 - not this number

The covers at near the back of the 59closet, and when found have them place on the each of the beds. However you see the pillow cases use the ones on the 9second shelve.
Reply
#5
(Jul-24-2022, 12:11 PM)giddyhead Wrote: Hi,

Thanks for the reply with information and help. Unfortunately it finds all the numbers within. The numbers in bold below is what I am looking to get rid of. The following is a sample but contains the format of the text in one of the lists which contain numbers throughout:
Hope it clarifies. Thanks in advance.

25 - not this number

the cow just over the moon and the sun is in 1the sky

26 - not this number

5one day is soon and soon is near take 5529care, 30over and out

59 - not this number

The covers at near the back of the 59closet, and when found have them place on the each of the beds. However you see the pillow cases use the ones on the 9second shelve.

No worries. The point of my post (sorry that I wasn't clear on this) is that once you know the position of the digits, you can then use that as first step to building the rest of your script and produce whatever output you like, so more a proof of concept really.

What code do you have so far?

Maybe a better way would be to have two functions (one to find the digits and one to find everything else) and have them work together, in a 'for loop' or a 'while loop', with 'if/else' branches to process the results.

def fdigit(d):
    digit = re.search('\d',d)
    if digit:
        return 1

def fchar(c):
    char = re.search('\D',c)
    if char:
        return 1
I've no idea what you're skill level is. Is this something that you're going to be able to do, or will you need guidance?

edit: in fact one function will suffice: if the digit test fails, then there's no need for the other test.
Sig:
>>> import this

The UNIX philosophy: "Do one thing, and do it well."

"The danger of computers becoming like humans is not as great as the danger of humans becoming like computers." :~ Konrad Zuse

"Everything should be made as simple as possible, but not simpler." :~ Albert Einstein
Reply
#6
(Jul-24-2022, 02:52 PM)rob101 Wrote:
(Jul-24-2022, 12:11 PM)giddyhead Wrote: Hi,

Thanks for the reply with information and help. Unfortunately it finds all the numbers within. The numbers in bold below is what I am looking to get rid of. The following is a sample but contains the format of the text in one of the lists which contain numbers throughout:
Hope it clarifies. Thanks in advance.

25 - not this number

the cow just over the moon and the sun is in 1the sky

26 - not this number

5one day is soon and soon is near take 5529care, 30over and out

59 - not this number

The covers at near the back of the 59closet, and when found have them place on the each of the beds. However you see the pillow cases use the ones on the 9second shelve.

No worries. The point of my post (sorry that I wasn't clear on this) is that once you know the position of the digits, you can then use that as first step to building the rest of your script and produce whatever output you like, so more a proof of concept really.

What code do you have so far?

Maybe a better way would be to have two functions (one to find the digits and one to find everything else) and have them work together, in a 'for loop' or a 'while loop', with 'if/else' branches to process the results.

def fdigit(d):
    digit = re.search('\d',d)
    if digit:
        return 1

def fchar(c):
    char = re.search('\D',c)
    if char:
        return 1
I've no idea what you're skill level is. Is this something that you're going to be able to do, or will you need guidance?

edit: in fact one function will suffice: if the digit test fails, then there's no need for the other test.

Quote:Ahhh I see got it and copy. Thanks for information. I am still learning and in need of guidance. The following code is what I have that searches for the numbers attached to the words and join them back:
 for cf in soup.findAll('div', {'class':'flex flex-auto flex-col bg-white shadow-md'}):
        
        txt = re.sub('\[(.*?)\]','',cf.text)  #Subtrack anything between brackets  
        txt2 = re.split('\d+([A-Za-z])',txt) #Look for digits only attached to words ignore them
        cmp = ''.join(txt2) #Join back minus the ignore numbers
        print('cf text',txt)
Reply
#7
I think I may have cracked this for you. If not, then I'm sure you can make any adjustments. If not, then I'm more than happy to help you.

Try it by coding in your text.

#!/usr/bin/python3

import re

def fdigit(d):
    digit = re.search('\d',d)
    if digit:
        lst_string = re.split('\d',d)
        return lst_string[-1]

string = "" # put your text in this string object

lst_string = string.split(' ')
pstring = ''

for get_word in range(len(lst_string)):
    word = lst_string[get_word]
    check = fdigit(word)
    if not check:
        pstring += word+' '
    else:
        pstring += check+' '

print(pstring)
I'm not 100% happy with my function name, now that it's doing a slightly different job to the one that it was conceived for, but it'll do.

{edit}

I Think we posted at almost the same time there. If this is any use to you, then cool; if not, maybe someone else can learn something from it. Smile
Sig:
>>> import this

The UNIX philosophy: "Do one thing, and do it well."

"The danger of computers becoming like humans is not as great as the danger of humans becoming like computers." :~ Konrad Zuse

"Everything should be made as simple as possible, but not simpler." :~ Albert Einstein
Reply
#8
(Jul-24-2022, 06:36 PM)rob101 Wrote: I think I may have cracked this for you. If not, then I'm sure you can make any adjustments. If not, then I'm more than happy to help you.

Try it by coding in your text.

#!/usr/bin/python3

import re

def fdigit(d):
    digit = re.search('\d',d)
    if digit:
        lst_string = re.split('\d',d)
        return lst_string[-1]

string = "" # put your text in this string object

lst_string = string.split(' ')
pstring = ''

for get_word in range(len(lst_string)):
    word = lst_string[get_word]
    check = fdigit(word)
    if not check:
        pstring += word+' '
    else:
        pstring += check+' '

print(pstring)
I'm not 100% happy with my function name, now that it's doing a slightly different job to the one that it was conceived for, but it'll do.

{edit}

I Think we posted at almost the same time there. If this is any use to you, then cool; if not, maybe someone else can learn something from it. Smile


Quote:I see. thanks for the information and help Unfortunately I will not be able to modify it to suit however I appreciate what you have done. may I ask help with the existing code posted? Thanks
Reply
#9
(Jul-24-2022, 08:32 PM)giddyhead Wrote: I see. thanks for the information and help Unfortunately I will not be able to modify it to suit however I appreciate what you have done. may I ask help with the existing code posted? Thanks

Sure; ask whatever you want to.
Sig:
>>> import this

The UNIX philosophy: "Do one thing, and do it well."

"The danger of computers becoming like humans is not as great as the danger of humans becoming like computers." :~ Konrad Zuse

"Everything should be made as simple as possible, but not simpler." :~ Albert Einstein
Reply
#10
Not too sure what you want to do now, maybe this??

import re

mylist = [15, '15', '1the', '5one', '5529care', '30over', '55more66', 25, '25']

pattern1 = re.compile(r'\D+') # matches non-numbers
# re.findall returns a list
for s in mylist:
    match = re.search(pattern1, str(s))    
    if match:     
        result = re.findall('[0-9]+', s)
        print(s)
        print(result)
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  find random numbers that are = to the first 2 number of a list. Frankduc 23 3,255 Apr-05-2023, 07:36 PM
Last Post: Frankduc
  Find and Replace numbers in String giddyhead 2 1,245 Jul-17-2022, 06:22 PM
Last Post: giddyhead
  find 2 largest equal numbers Frankduc 13 3,558 Jan-11-2022, 07:10 PM
Last Post: Frankduc
  Find and replace in files with regex and Python Melcu54 0 1,852 Jun-03-2021, 09:33 AM
Last Post: Melcu54
  Find and replace to capitalize with Regex hermobot 2 2,530 Mar-21-2020, 12:30 PM
Last Post: hermobot
  Python regex to get only numbers tantony 6 4,099 Oct-09-2019, 11:53 PM
Last Post: newbieAuggie2019
  Print Numbers starting at 1 vertically with separator for output numbers Pleiades 3 3,751 May-09-2019, 12:19 PM
Last Post: Pleiades
  How to find the sum of even numbers from entered N numbers? Rajath 2 12,394 Sep-13-2017, 07:19 PM
Last Post: nilamo

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020