Python Forum
Extract continuous numeric characters from a string in Python
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Extract continuous numeric characters from a string in Python
#1
I am interested in extracting a number that appears after a set of characters ('AA='). However, the issue is I: (i) am not aware how long the number is, (ii) don't know what appears right after the number (could be a blank space or ANY character except 0-9, consider that I do not know what these characters could be but they are definitely not 0-9).

Given below are few of many inputs that I can have.

Line 1: 123 NUBA AA=1.2345 $BB=1234.55
Line 2: 123 NUBA MM AA=1.2345678&BB=1234.55
Line 3: 123 NUBA RRNJH AA=1.2#ALPHA
...
The result should be: 1.2345 1.2345678 1.2 for each respective line above.

PS: I know how to use .find and get the starting location of AA= but that is not very helpful for the above two conditions. Also, I understand one way could be to loop through each character after after AA= and break if a blank space or anything except 0-9 is seen, but that is clumsy and takes unnecessary space in my code. I am looking for a more neat way of doing this.
Reply
#2
What you want to do is pickup each character after 'AA=' as long as it's a number or a decimal point. Combine those into a string and then convert it to a float. Here is one way to go about that:

data = ['Line 1: 123 NUBA AA=1.2345 $BB=1234.55',
	'Line 2: 123 NUBA MM AA=1.2345678&BB=1234.55',
	'Line 3: 123 NUBA RRNJH AA=1.2#ALPHA']

ACCEPTIBLE = '123456789.'
aa_numbers = []

for line in data :
	temp_number_string = ''
	marker = line.index ('AA=') + 3
	while line [marker] in ACCEPTIBLE :
		temp_number_string += line [marker]
		marker += 1
	aa_numbers.append (float (temp_number_string))

print (aa_numbers)
snippsat likes this post
Reply
#3
I would usually think of regex with that description,nice way not using regex bye BashBedlam.
So something like this with a combo with compile/finditer make it faster if iterate over large amount of data.
import re

data = '''\
Line 1: 123 NUBA AA=1.2345 $BB=1234.55
Line 2: 123 NUBA MM AA=1.2345678&BB=1234.55
Line 3: 123 NUBA RRNJH AA=1.2#ALPHA'''

pattern =  re.compile(r"AA=([+-]?([0-9]*[.])?[0-9]+)")
for match in pattern.finditer(data):
    print(float(match.group(1)))

Output:
1.2345 1.2345678 1.2
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Numeric Enigma Machine idev 8 200 8 hours ago
Last Post: idev
  extract substring from a string before a word !! evilcode1 3 491 Nov-08-2023, 12:18 AM
Last Post: evilcode1
  ValueError: Unknown label type: 'continuous-multioutput' hobbyist 7 1,191 Sep-13-2023, 05:41 PM
Last Post: deanhystad
  doing string split with 2 or more split characters Skaperen 22 2,317 Aug-13-2023, 01:57 AM
Last Post: Skaperen
  How do I check if the first X characters of a string are numbers? FirstBornAlbratross 6 1,429 Apr-12-2023, 10:39 AM
Last Post: jefsummers
  python extract mg24 1 916 Nov-02-2022, 06:30 PM
Last Post: Larz60+
Question Numeric Anagrams - Count Occurances monty024 2 1,475 Nov-13-2021, 05:05 PM
Last Post: monty024
  How to get datetime from numeric format field klllmmm 3 1,961 Nov-06-2021, 03:26 PM
Last Post: snippsat
  Extract a string between 2 words from a text file OscarBoots 2 1,827 Nov-02-2021, 08:50 AM
Last Post: ibreeden
Question [SOLVED] Delete specific characters from string lines EnfantNicolas 4 2,143 Oct-21-2021, 11:28 AM
Last Post: EnfantNicolas

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020