Python Forum

Full Version: Python regex to get only numbers
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I have the following script

import re

w = 'KTPA 081653Z 00000KT 10SM BKN022TCU BKN040 OVC001RMK 28/23 A2990 FEW070 RMK AO2 SLP125 TCU SCT080 NW-NE T02830228'
k = re.findall("FEW\d+|SCT\d+|BKN\d+|OVC\d+", w)
print(k)
When I run it, I'm getting
Quote:['BKN022', 'BKN040', 'OVC001', 'FEW070', 'SCT080']
, but I only want the integers from it so that it looks like this
Quote:['022', '040', '001', '070', '080']
. Once I get the integers, I can sort it to get the lowest number.

How would I do that?
You'll want something more like this:

regex = re.compile("(?:FEW|BKN|OVC|SCT)(\d{3})")
The digits will be in a capture group for each match. Review the Python documentation for regex groups to access them.
I tried
re.compile("(?:FEW|BKN|OVC|SCT)(\d{3})")
and I was getting
Quote:re.compile('(?:FEW|BKN|OVC|SCT)(\\d{3})')

Then I tried
re.findall("(FEW|SCT|BKN|OVC)(\d{3})", w)
and now I'm getting
Quote:[('BKN', '022'), ('BKN', '040'), ('OVC', '001'), ('FEW', '070'), ('SCT', '080')]

How can I get only the integers from this?
(Oct-09-2019, 06:57 PM)tantony Wrote: [ -> ]How can I get only the integers from this?
import re

w = 'KTPA 081653Z 00000KT 10SM BKN022TCU BKN040 OVC001RMK 28/23 A2990 FEW070 RMK AO2 SLP125 TCU SCT080 NW-NE T02830228'
k = re.findall("(FEW|SCT|BKN|OVC)(\d{3})", w)
lst = [int(i[1]) for i in k]
print(lst)
Output:
[22, 40, 1, 70, 80]
@snippsat, thanks that worked. So just to make sure, there's no way to get just the integers from my original regex?
k = re.findall("FEW\d+|SCT\d+|BKN\d+|OVC\d+", w)
With my original regex, I was getting
Quote:['BKN022', 'BKN040', 'OVC001', 'FEW070', 'SCT080']
The "?:" in the capture group changes it to a non-capturing group. So "(?:FEW|SCT|BKN|OVC)(\d{3})" would result in only the numbers. From what you posted, it looks like you compiled the regex using my code but didn't use it to match anything.
(Oct-09-2019, 07:45 PM)tantony Wrote: [ -> ]@snippsat, thanks that worked. So just to make sure, there's no way to get just the integers from my original regex?
k = re.findall("FEW\d+|SCT\d+|BKN\d+|OVC\d+", w)
With my original regex, I was getting
Quote:['BKN022', 'BKN040', 'OVC001', 'FEW070', 'SCT080']

(Oct-09-2019, 10:27 PM)stullis Wrote: [ -> ]The "?:" in the capture group changes it to a non-capturing group. So "(?:FEW|SCT|BKN|OVC)(\d{3})" would result in only the numbers. From what you posted, it looks like you compiled the regex using my code but didn't use it to match anything.

Hi!

I think that sometimes, newbies like myself, don't get straightaway what the experienced programmers here unselfishly and kindly provide as answers and advice.

Maybe, if you are also a newbie, you didn't realize that Stullis was also pointing you out another solution, although you had to do the necessary adjustments. Here I'll show you what I think he meant (regex1), comparing it with what you had before (k):

import re
 
w = 'KTPA 081653Z 00000KT 10SM BKN022TCU BKN040 OVC001RMK 28/23 A2990 FEW070 RMK AO2 SLP125 TCU SCT080 NW-NE T02830228'
k = re.findall("FEW\d+|SCT\d+|BKN\d+|OVC\d+", w)
regex1 = re.findall("(?:FEW|BKN|OVC|SCT)(\d{3})", w)
print(k)
print(regex1)
and that produces the following output:
Output:
['BKN022', 'BKN040', 'OVC001', 'FEW070', 'SCT080'] ['022', '040', '001', '070', '080']
All the best,