Extract continuous numeric characters from a string in Python

Robotguy · (This post was last modified: Jan-15-2021, 10:41 PM by Robotguy.)

I am interested in extracting a number that appears after a set of characters ('AA='). However, the issue is I: (i) am not aware how long the number is, (ii) don't know what appears right after the number (could be a blank space or ANY character except 0-9, consider that I do not know what these characters could be but they are definitely not 0-9).

Given below are few of many inputs that I can have.

Line 1: 123 NUBA AA=1.2345 $BB=1234.55
Line 2: 123 NUBA MM AA=1.2345678&BB=1234.55
Line 3: 123 NUBA RRNJH AA=1.2#ALPHA
...

The result should be: 1.2345 1.2345678 1.2 for each respective line above.

PS: I know how to use .find and get the starting location of AA= but that is not very helpful for the above two conditions. Also, I understand one way could be to loop through each character after after AA= and break if a blank space or anything except 0-9 is seen, but that is clumsy and takes unnecessary space in my code. I am looking for a more neat way of doing this.

BashBedlam · (This post was last modified: Jan-15-2021, 11:38 PM by BashBedlam.)

What you want to do is pickup each character after 'AA=' as long as it's a number or a decimal point. Combine those into a string and then convert it to a float. Here is one way to go about that:

data = ['Line 1: 123 NUBA AA=1.2345 $BB=1234.55',
	'Line 2: 123 NUBA MM AA=1.2345678&BB=1234.55',
	'Line 3: 123 NUBA RRNJH AA=1.2#ALPHA']

ACCEPTIBLE = '123456789.'
aa_numbers = []

for line in data :
	temp_number_string = ''
	marker = line.index ('AA=') + 3
	while line [marker] in ACCEPTIBLE :
		temp_number_string += line [marker]
		marker += 1
	aa_numbers.append (float (temp_number_string))

print (aa_numbers)

***snippsat*** · Jan-16-2021, 12:44 AM

I would usually think of regex with that description,nice way not using regex bye BashBedlam.
So something like this with a combo with compile/finditer make it faster if iterate over large amount of data.

import re

data = '''\
Line 1: 123 NUBA AA=1.2345 $BB=1234.55
Line 2: 123 NUBA MM AA=1.2345678&BB=1234.55
Line 3: 123 NUBA RRNJH AA=1.2#ALPHA'''

pattern =  re.compile(r"AA=([+-]?([0-9]*[.])?[0-9]+)")
for match in pattern.finditer(data):
    print(float(match.group(1)))

Output:1.2345
1.2345678
1.2

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	[SOLVED] How to replace characters in a string?	Winfried	2	1,141	Sep-04-2024, 01:41 PM Last Post: Winfried
	extract an element of a list into a string	alexs	5	4,099	Aug-30-2024, 09:24 PM Last Post: alexs
	Numeric Enigma Machine	idev	9	3,075	Mar-29-2024, 06:15 PM Last Post: idev
	extract substring from a string before a word !!	evilcode1	3	1,984	Nov-08-2023, 12:18 AM Last Post: evilcode1
	ValueError: Unknown label type: 'continuous-multioutput'	hobbyist	7	3,276	Sep-13-2023, 05:41 PM Last Post: deanhystad
	doing string split with 2 or more split characters	Skaperen	22	6,569	Aug-13-2023, 01:57 AM Last Post: Skaperen
	How do I check if the first X characters of a string are numbers?	FirstBornAlbratross	6	3,244	Apr-12-2023, 10:39 AM Last Post: jefsummers
	Numeric Anagrams - Count Occurances	monty024	2	2,297	Nov-13-2021, 05:05 PM Last Post: monty024
	How to get datetime from numeric format field	klllmmm	3	2,806	Nov-06-2021, 03:26 PM Last Post: snippsat
	Extract a string between 2 words from a text file	OscarBoots	2	2,824	Nov-02-2021, 08:50 AM Last Post: ibreeden

Extract continuous numeric characters from a string in Python

User Panel Messages

Announcements