Pandas/Dataframes, Strings and Regular Expressions...

Stephan · Nov-25-2020, 08:08 AM

Hi all

I am new in the Python world (20 years ago I did some C/C++). For being new, I was able to achieve quite a lot so far. I successfully got around RegEx all these years, but this seems to change now...

With this problem I didn't get a solution so far, I also think so far I have not fully understood the indexing/selecting mechanism.

I have a data frame 'data_total' (what a thrilling name...) with the column INFO. It contains strings like 'X-Z-34567A' or 'X-Y-123456'.
I'd like to extract the numbers into a new column INFO_NR. The letter on the tail is to replace with a '0'.
After all, data should read '345670' and '123456'

First I tried a slightly other way: I extracted the number part, converted it to int and multiplied by 10.

See the following code snippet:

# this processes the X-Z-34567A correctly, fills the fields of the other rows with nan
data_total['INFO_NR'] = data_total['INFO'].str.extract('^X-\w-(\d*)[ABCDEFGHILKMNOPQRSTUVWXYZ]$', expand=False).str.strip()
data_total['INFO_NR'] = data_total['INFO_NR'].fillna('0')
data_total['INFO_NR'] = data_total['INFO_NR'].astype(np.int64)*10

# this processes the X-Y-123456 correctly, but fills the previously processed fields with nan!!
data_total['INFO_NR'] = data_total['INFO'].str.extract('^X-\w-(\d*)$', expand=False).str.strip()
data_total['INFO_NR'] = data_total['INFO_NR'].astype(np.int64)*10

Both the regexes work, but the second deletes the results of the first. How can I apply the second regex only on the rows with INFO_NR == 0, without deleting the first results?

And how I got to know Python so far, there should be a much more elegant solution out there Smile

Looking forward to your inputs
Thank you
Stephan

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Find strings by index from a list of indexes in a different Pandas dataframe column	Calab	3	1,776	Aug-26-2024, 04:52 PM Last Post: Calab
	Pandas dataframes and numpy arrays	bytecrunch	1	2,186	Oct-11-2022, 08:08 PM Last Post: Larz60+
	Merging sorted dataframes using Pandas	Robotguy	1	3,029	Aug-12-2020, 07:11 PM Last Post: jefsummers
	Merging two DataFrames based on indexes from two other DataFrames	lucinda_rigeitti	0	2,423	Jan-16-2020, 08:36 PM Last Post: lucinda_rigeitti
	Why can't I merge pandas dataframes	learnpython2018	2	9,646	Sep-23-2018, 05:53 PM Last Post: learnpython2018

Pandas/Dataframes, Strings and Regular Expressions...

User Panel Messages

Announcements