Posts: 6,780
Threads: 20
Joined: Feb 2020
Looks like you want to match one or more numbers followed by one or more letters. I also decided to catch one or more letters followed by one or more numbers
test_text = StringIO(
"""25 - not this number1
the cow just over the moon and the sun is in 1the sky
26 - not this number
5one day is soon and soon is near take 5529care, 30over and out
59 - not this number
The covers at near the back of the 59closet, and when found have them place on the each of the beds. However you see the pillow cases use the ones on the 9second shelve."""
)
pattern = re.compile(r"[0-9]+[a-zA-Z]+|[a-zA-Z]+[0-9]+")
for line in test_text:
matches = re.findall(pattern, line)
if matches:
print(f"{line}Matches = {matches}\n")
else:
print(f"{line}No matches\n") Output: 25 - not this number1
Matches = ['number1']
the cow just over the moon and the sun is in 1the sky
Matches = ['1the']
26 - not this number
No matches
5one day is soon and soon is near take 5529care, 30over and out
Matches = ['5one', '5529care', '30over']
59 - not this number
No matches
The covers at near the back of the 59closet, and when found have them place on the each of the beds. However you see the pillow cases use the ones on the 9second shelve.
Matches = ['59closet', '9second']
Posts: 58
Threads: 19
Joined: Jan 2021
Jul-25-2022, 05:15 AM
(This post was last modified: Jul-25-2022, 05:15 AM by giddyhead.)
(Jul-25-2022, 04:26 AM)deanhystad Wrote: Looks like you want to match one or more numbers followed by one or more letters. I also decided to catch one or more letters followed by one or more numbers
test_text = StringIO(
"""25 - not this number1
the cow just over the moon and the sun is in 1the sky
26 - not this number
5one day is soon and soon is near take 5529care, 30over and out
59 - not this number
The covers at near the back of the 59closet, and when found have them place on the each of the beds. However you see the pillow cases use the ones on the 9second shelve."""
)
pattern = re.compile(r"[0-9]+[a-zA-Z]+|[a-zA-Z]+[0-9]+")
for line in test_text:
matches = re.findall(pattern, line)
if matches:
print(f"{line}Matches = {matches}\n")
else:
print(f"{line}No matches\n") Output: 25 - not this number1
Matches = ['number1']
the cow just over the moon and the sun is in 1the sky
Matches = ['1the']
26 - not this number
No matches
5one day is soon and soon is near take 5529care, 30over and out
Matches = ['5one', '5529care', '30over']
59 - not this number
No matches
The covers at near the back of the 59closet, and when found have them place on the each of the beds. However you see the pillow cases use the ones on the 9second shelve.
Matches = ['59closet', '9second']
Quote:Thanks for the help. At this time I am looking for a way to only find the numbers attached to the words. Instead of finding the whole thing can it be modified to only find the numbers only for example 5one, 5529care, 30over, etc? Thanks
Posts: 58
Threads: 19
Joined: Jan 2021
(Jul-25-2022, 12:26 AM)Pedroski55 Wrote: Not too sure what you want to do now, maybe this??
import re
mylist = [15, '15', '1the', '5one', '5529care', '30over', '55more66', 25, '25']
pattern1 = re.compile(r'\D+') # matches non-numbers
# re.findall returns a list
for s in mylist:
match = re.search(pattern1, str(s))
if match:
result = re.findall('[0-9]+', s)
print(s)
print(result)
Quote:Thanks. I tried to pass the whole list but for some reason I was receiving blank print statements. For reference and ease I have included a link https://regex101.com/r/fhDBF6/2 to the format of the data. Thanks
Posts: 58
Threads: 19
Joined: Jan 2021
(Jul-24-2022, 11:03 PM)rob101 Wrote: (Jul-24-2022, 08:32 PM)giddyhead Wrote: I see. thanks for the information and help Unfortunately I will not be able to modify it to suit however I appreciate what you have done. may I ask help with the existing code posted? Thanks
Sure; ask whatever you want to.
Quote:Copy. Thanks. Yeah looking for a way to get rid of the numbers attached to words using the code from web scrapping.
Posts: 453
Threads: 16
Joined: Jun 2022
Jul-25-2022, 06:07 AM
(This post was last modified: Jul-25-2022, 06:15 AM by rob101.)
(Jul-25-2022, 05:27 AM)giddyhead Wrote: Yeah looking for a way to get rid of the numbers attached to words using the code from web scrapping.
Okay. That code from your web scrapping, as you call it, is not in a style that I work with, as you may gather from what I've posted.
As for getting rid of the numbers attached to words, that is precisely the objective of my posted script, given that said number is prefixed.
Maybe this is a 'Language barrier' issue. Do you fully understand English?
edit for p.s
I've updated the my code, as I was unhappy about some of the object names that I used (it was developed on-the-fly) and as I'll be adding this code to my notes, I've cleaned it up.
#!/usr/bin/python3
import re
def f_number(num):
number = re.search('\d',num)
if number:
num_list = re.split('\d',num)
return num_list[-1]
input_string = "" # put your text in this string object
string_list = input_string.split(' ')
p_string = ''
for get_word in range(len(string_list)):
word = string_list[get_word]
number = f_number(word)
if not number:
p_string += word+' '
else:
p_string += number+' '
print(p_string)
Sig:
>>> import this
The UNIX philosophy: "Do one thing, and do it well."
"The danger of computers becoming like humans is not as great as the danger of humans becoming like computers." :~ Konrad Zuse
"Everything should be made as simple as possible, but not simpler." :~ Albert Einstein
Posts: 6,780
Threads: 20
Joined: Feb 2020
When you know what you want to replace it is easy to remove the digits. This uses a comprehension strip the numbers.:
from io import StringIO
import re
test_text = StringIO(
"""25 - not this number1
the cow just over the moon and the sun is in 1the sky
26 - not this number
5one day is soon and soon is near take 5529care, 30over and out
59 - not this number
The covers at near the back of the 59closet, and when found have them place on the each of the beds. However you see the pillow cases use the ones on the 9second shelve."""
)
pattern = re.compile(r"[0-9]+[a-zA-Z]+|[a-zA-Z]+[0-9]+")
for line in test_text:
matches = re.findall(pattern, line)
if matches:
for match in matches:
line = line.replace(match, "".join([c for c in match if c not in '0123456789']))
print(line.rstrip()) Output: 25 - not this number
the cow just over the moon and the sun is in the sky
26 - not this number
one day is soon and soon is near take care, over and out
59 - not this number
The covers at near the back of the closet, and when found have them place on the each of the beds. However you see the pillow cases use the ones on the second shelve.
And this uses another regex.
for match in matches:
line = line.replace(match, re.findall(stripper, match)[0]) But it is better to use re.sub(). Write a function that returns a digit-less version of the matching string. This function is the repl argument to the re.sub(patter, repl, string) call.
from io import StringIO
import re
test_text = StringIO(
"""25 - not this number1
the cow just over the moon and the sun is in 1the sky
26 - not this number
5one day is soon and soon is near take 5529care, 30over and out
59 - not this number
The covers at near the back of the 59closet, and when found have them place on the each of the beds. However you see the pillow cases use the ones on the 9second shelve."""
)
stripper = re.compile(r"[a-zA-Z]+")
finder = re.compile(r"[0-9]+[a-zA-Z]+|[a-zA-Z]+[0-9]+")
def strip_digits(match):
"""This is the repl function used by re.sub()"""
return re.findall(stripper, match.group())[0]
for line in test_text:
print(re.sub(finder, strip_digits, line).rstrip())
Posts: 7,313
Threads: 123
Joined: Sep 2016
(Jul-25-2022, 05:15 AM)giddyhead Wrote: Instead of finding the whole thing can it be modified to only find the numbers only for example 5one, 5529care, 30over, etc? Thanks If add a group () to deanhystad code then will get only numbers.
from io import StringIO
import re
test_text = StringIO(
"""25 - not this number1
the cow just over the moon and the sun is in 1the sky
26 - not this number
5one day is soon and soon is near take 5529care, 30over and out
59 - not this number
The covers at near the back of the 59closet, and when found have them place on the each of the beds. However you see the pillow cases use the ones on the 9second shelve."""
)
pattern = re.compile(r"([0-9]+)[a-zA-Z]+|[a-zA-Z]+([0-9]+)")
for line in test_text:
matches = re.findall(pattern, line)
if matches:
print(f"{line}Matches = {matches}\n")
else:
print(f"{line}No matches\n") Output: 25 - not this number1
Matches = [('', '1')]
the cow just over the moon and the sun is in 1the sky
Matches = [('1', '')]
26 - not this number
No matches
5one day is soon and soon is near take 5529care, 30over and out
Matches = [('5', ''), ('5529', ''), ('30', '')]
59 - not this number
No matches
The covers at near the back of the 59closet, and when found have them place on the each of the beds. However you see the pillow cases use the ones on the 9second shelve.Matches = [('59', ''), ('9', '')]
Clean up.
>>> Matches = [('5', ''), ('5529', ''), ('30', '')]
>>> [i[0] for i in Matches]
['5', '5529', '30']
Posts: 58
Threads: 19
Joined: Jan 2021
(Jul-25-2022, 07:32 PM)snippsat Wrote: (Jul-25-2022, 05:15 AM)giddyhead Wrote: Instead of finding the whole thing can it be modified to only find the numbers only for example 5one, 5529care, 30over, etc? Thanks If add a group () to deanhystad code then will get only numbers.
from io import StringIO
import re
test_text = StringIO(
"""25 - not this number1
the cow just over the moon and the sun is in 1the sky
26 - not this number
5one day is soon and soon is near take 5529care, 30over and out
59 - not this number
The covers at near the back of the 59closet, and when found have them place on the each of the beds. However you see the pillow cases use the ones on the 9second shelve."""
)
pattern = re.compile(r"([0-9]+)[a-zA-Z]+|[a-zA-Z]+([0-9]+)")
for line in test_text:
matches = re.findall(pattern, line)
if matches:
print(f"{line}Matches = {matches}\n")
else:
print(f"{line}No matches\n") Output: 25 - not this number1
Matches = [('', '1')]
the cow just over the moon and the sun is in 1the sky
Matches = [('1', '')]
26 - not this number
No matches
5one day is soon and soon is near take 5529care, 30over and out
Matches = [('5', ''), ('5529', ''), ('30', '')]
59 - not this number
No matches
The covers at near the back of the 59closet, and when found have them place on the each of the beds. However you see the pillow cases use the ones on the 9second shelve.Matches = [('59', ''), ('9', '')]
Clean up.
>>> Matches = [('5', ''), ('5529', ''), ('30', '')]
>>> [i[0] for i in Matches]
['5', '5529', '30']
Got it Thank you!
Posts: 58
Threads: 19
Joined: Jan 2021
Sorry did not get to post until now. Completed! Thank you all for your help, time and information.
|