Posts: 58
Threads: 19
Joined: Jan 2021
Hello,
I have an regex issues as I am seeking to find only the numbers that are attached to words:
For example:
1the
5one
5529care
30over
The following regex \d+([A-Za-z]) finds the number to include the first letter of the word. What modifications do I need so that it will find only the numbers attached to the words. Thanks
A sample text for reference:
25 - not this number
the cow just over the moon and the sun is in 1the sky
26 - not this number
5one day is soon and soon is near take 5529care, 30over and out
Posts: 453
Threads: 16
Joined: Jun 2022
Jul-24-2022, 06:26 AM
(This post was last modified: Jul-24-2022, 06:26 AM by rob101.)
Hi.
Does this do what you want?
import re
string = '1the 5one 5529care 30over'
for i in range(len(string)):
digit = re.search('[0-9]', string[i])
if digit:
print(f'Found digit {string[i]} at position {i}') Output: Found digit 1 at position 0
Found digit 5 at position 5
Found digit 5 at position 10
Found digit 5 at position 11
Found digit 2 at position 12
Found digit 9 at position 13
Found digit 3 at position 19
Found digit 0 at position 20
{edit to remove my debug line of code}
Sig:
>>> import this
The UNIX philosophy: "Do one thing, and do it well."
"The danger of computers becoming like humans is not as great as the danger of humans becoming like computers." :~ Konrad Zuse
"Everything should be made as simple as possible, but not simpler." :~ Albert Einstein
Posts: 1,094
Threads: 143
Joined: Jul 2017
Maybe like this?
import re
mylist = ['1the', '5one', '5529care', '30over', '55more66']
# re.findall returns a list
for s in mylist:
result = re.findall('[0-9]+', s)
print(s)
print(result)
Posts: 58
Threads: 19
Joined: Jan 2021
Jul-24-2022, 12:11 PM
(This post was last modified: Jul-24-2022, 02:16 PM by Yoriz.
Edit Reason: Formatting
)
(Jul-24-2022, 06:26 AM)rob101 Wrote: Hi.
Does this do what you want?
import re
string = '1the 5one 5529care 30over'
for i in range(len(string)):
digit = re.search('[0-9]', string[i])
if digit:
print(f'Found digit {string[i]} at position {i}') Output: Found digit 1 at position 0
Found digit 5 at position 5
Found digit 5 at position 10
Found digit 5 at position 11
Found digit 2 at position 12
Found digit 9 at position 13
Found digit 3 at position 19
Found digit 0 at position 20
{edit to remove my debug line of code}
Hi,
Thanks for the reply with information and help. Unfortunately it finds all the numbers within. The numbers in bold below is what I am looking to get rid of. The following is a sample but contains the format of the text in one of the lists which contain numbers throughout:
Hope it clarifies. Thanks in advance.
25 - not this number
the cow just over the moon and the sun is in 1the sky
26 - not this number
5one day is soon and soon is near take 5529care, 30over and out
59 - not this number
The covers at near the back of the 59closet, and when found have them place on the each of the beds. However you see the pillow cases use the ones on the 9second shelve.
(Jul-24-2022, 07:29 AM)Pedroski55 Wrote: Maybe like this?
import re
mylist = ['1the', '5one', '5529care', '30over', '55more66']
# re.findall returns a list
for s in mylist:
result = re.findall('[0-9]+', s)
print(s)
print(result)
Hey,
Thanks for the help. Unfortunately due to numbers spread throughout each list it finds all the numbers. Looking to only find the numbers attached to words in bold. I have included a sample list for reference.
Thanks
25 - not this number
the cow just over the moon and the sun is in 1the sky
26 - not this number
5one day is soon and soon is near take 5529care, 30over and out
59 - not this number
The covers at near the back of the 59closet, and when found have them place on the each of the beds. However you see the pillow cases use the ones on the 9second shelve.
Posts: 453
Threads: 16
Joined: Jun 2022
Jul-24-2022, 02:52 PM
(This post was last modified: Jul-24-2022, 02:52 PM by rob101.
Edit Reason: Formatting
)
(Jul-24-2022, 12:11 PM)giddyhead Wrote: Hi,
Thanks for the reply with information and help. Unfortunately it finds all the numbers within. The numbers in bold below is what I am looking to get rid of. The following is a sample but contains the format of the text in one of the lists which contain numbers throughout:
Hope it clarifies. Thanks in advance.
25 - not this number
the cow just over the moon and the sun is in 1the sky
26 - not this number
5one day is soon and soon is near take 5529care, 30over and out
59 - not this number
The covers at near the back of the 59closet, and when found have them place on the each of the beds. However you see the pillow cases use the ones on the 9second shelve.
No worries. The point of my post (sorry that I wasn't clear on this) is that once you know the position of the digits, you can then use that as first step to building the rest of your script and produce whatever output you like, so more a proof of concept really.
What code do you have so far?
Maybe a better way would be to have two functions (one to find the digits and one to find everything else) and have them work together, in a 'for loop' or a 'while loop', with 'if/else' branches to process the results.
def fdigit(d):
digit = re.search('\d',d)
if digit:
return 1
def fchar(c):
char = re.search('\D',c)
if char:
return 1 I've no idea what you're skill level is. Is this something that you're going to be able to do, or will you need guidance?
edit: in fact one function will suffice: if the digit test fails, then there's no need for the other test.
Sig:
>>> import this
The UNIX philosophy: "Do one thing, and do it well."
"The danger of computers becoming like humans is not as great as the danger of humans becoming like computers." :~ Konrad Zuse
"Everything should be made as simple as possible, but not simpler." :~ Albert Einstein
Posts: 58
Threads: 19
Joined: Jan 2021
(Jul-24-2022, 02:52 PM)rob101 Wrote: (Jul-24-2022, 12:11 PM)giddyhead Wrote: Hi,
Thanks for the reply with information and help. Unfortunately it finds all the numbers within. The numbers in bold below is what I am looking to get rid of. The following is a sample but contains the format of the text in one of the lists which contain numbers throughout:
Hope it clarifies. Thanks in advance.
25 - not this number
the cow just over the moon and the sun is in 1the sky
26 - not this number
5one day is soon and soon is near take 5529care, 30over and out
59 - not this number
The covers at near the back of the 59closet, and when found have them place on the each of the beds. However you see the pillow cases use the ones on the 9second shelve.
No worries. The point of my post (sorry that I wasn't clear on this) is that once you know the position of the digits, you can then use that as first step to building the rest of your script and produce whatever output you like, so more a proof of concept really.
What code do you have so far?
Maybe a better way would be to have two functions (one to find the digits and one to find everything else) and have them work together, in a 'for loop' or a 'while loop', with 'if/else' branches to process the results.
def fdigit(d):
digit = re.search('\d',d)
if digit:
return 1
def fchar(c):
char = re.search('\D',c)
if char:
return 1 I've no idea what you're skill level is. Is this something that you're going to be able to do, or will you need guidance?
edit: in fact one function will suffice: if the digit test fails, then there's no need for the other test.
Quote:Ahhh I see got it and copy. Thanks for information. I am still learning and in need of guidance. The following code is what I have that searches for the numbers attached to the words and join them back:
for cf in soup.findAll('div', {'class':'flex flex-auto flex-col bg-white shadow-md'}):
txt = re.sub('\[(.*?)\]','',cf.text) #Subtrack anything between brackets
txt2 = re.split('\d+([A-Za-z])',txt) #Look for digits only attached to words ignore them
cmp = ''.join(txt2) #Join back minus the ignore numbers
print('cf text',txt)
Posts: 453
Threads: 16
Joined: Jun 2022
Jul-24-2022, 06:36 PM
(This post was last modified: Jul-24-2022, 06:36 PM by rob101.)
I think I may have cracked this for you. If not, then I'm sure you can make any adjustments. If not, then I'm more than happy to help you.
Try it by coding in your text.
#!/usr/bin/python3
import re
def fdigit(d):
digit = re.search('\d',d)
if digit:
lst_string = re.split('\d',d)
return lst_string[-1]
string = "" # put your text in this string object
lst_string = string.split(' ')
pstring = ''
for get_word in range(len(lst_string)):
word = lst_string[get_word]
check = fdigit(word)
if not check:
pstring += word+' '
else:
pstring += check+' '
print(pstring) I'm not 100% happy with my function name, now that it's doing a slightly different job to the one that it was conceived for, but it'll do.
{edit}
I Think we posted at almost the same time there. If this is any use to you, then cool; if not, maybe someone else can learn something from it.
Sig:
>>> import this
The UNIX philosophy: "Do one thing, and do it well."
"The danger of computers becoming like humans is not as great as the danger of humans becoming like computers." :~ Konrad Zuse
"Everything should be made as simple as possible, but not simpler." :~ Albert Einstein
Posts: 58
Threads: 19
Joined: Jan 2021
(Jul-24-2022, 06:36 PM)rob101 Wrote: I think I may have cracked this for you. If not, then I'm sure you can make any adjustments. If not, then I'm more than happy to help you.
Try it by coding in your text.
#!/usr/bin/python3
import re
def fdigit(d):
digit = re.search('\d',d)
if digit:
lst_string = re.split('\d',d)
return lst_string[-1]
string = "" # put your text in this string object
lst_string = string.split(' ')
pstring = ''
for get_word in range(len(lst_string)):
word = lst_string[get_word]
check = fdigit(word)
if not check:
pstring += word+' '
else:
pstring += check+' '
print(pstring) I'm not 100% happy with my function name, now that it's doing a slightly different job to the one that it was conceived for, but it'll do.
{edit}
I Think we posted at almost the same time there. If this is any use to you, then cool; if not, maybe someone else can learn something from it. 
Quote:I see. thanks for the information and help Unfortunately I will not be able to modify it to suit however I appreciate what you have done. may I ask help with the existing code posted? Thanks
Posts: 453
Threads: 16
Joined: Jun 2022
(Jul-24-2022, 08:32 PM)giddyhead Wrote: I see. thanks for the information and help Unfortunately I will not be able to modify it to suit however I appreciate what you have done. may I ask help with the existing code posted? Thanks
Sure; ask whatever you want to.
Sig:
>>> import this
The UNIX philosophy: "Do one thing, and do it well."
"The danger of computers becoming like humans is not as great as the danger of humans becoming like computers." :~ Konrad Zuse
"Everything should be made as simple as possible, but not simpler." :~ Albert Einstein
Posts: 1,094
Threads: 143
Joined: Jul 2017
Not too sure what you want to do now, maybe this??
import re
mylist = [15, '15', '1the', '5one', '5529care', '30over', '55more66', 25, '25']
pattern1 = re.compile(r'\D+') # matches non-numbers
# re.findall returns a list
for s in mylist:
match = re.search(pattern1, str(s))
if match:
result = re.findall('[0-9]+', s)
print(s)
print(result)
|