Python Forum
Pandas/Dataframes, Strings and Regular Expressions...
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Pandas/Dataframes, Strings and Regular Expressions...
#1
Hi all

I am new in the Python world (20 years ago I did some C/C++). For being new, I was able to achieve quite a lot so far. I successfully got around RegEx all these years, but this seems to change now...

With this problem I didn't get a solution so far, I also think so far I have not fully understood the indexing/selecting mechanism.

I have a data frame 'data_total' (what a thrilling name...) with the column INFO. It contains strings like 'X-Z-34567A' or 'X-Y-123456'.
I'd like to extract the numbers into a new column INFO_NR. The letter on the tail is to replace with a '0'.
After all, data should read '345670' and '123456'

First I tried a slightly other way: I extracted the number part, converted it to int and multiplied by 10.

See the following code snippet:

1
2
3
4
5
6
7
8
# this processes the X-Z-34567A correctly, fills the fields of the other rows with nan
data_total['INFO_NR'] = data_total['INFO'].str.extract('^X-\w-(\d*)[ABCDEFGHILKMNOPQRSTUVWXYZ]$', expand=False).str.strip()
data_total['INFO_NR'] = data_total['INFO_NR'].fillna('0')
data_total['INFO_NR'] = data_total['INFO_NR'].astype(np.int64)*10
 
# this processes the X-Y-123456 correctly, but fills the previously processed fields with nan!!
data_total['INFO_NR'] = data_total['INFO'].str.extract('^X-\w-(\d*)$', expand=False).str.strip()
data_total['INFO_NR'] = data_total['INFO_NR'].astype(np.int64)*10
Both the regexes work, but the second deletes the results of the first. How can I apply the second regex only on the rows with INFO_NR == 0, without deleting the first results?

And how I got to know Python so far, there should be a much more elegant solution out there Smile

Looking forward to your inputs
Thank you
Stephan
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Find strings by index from a list of indexes in a different Pandas dataframe column Calab 3 1,578 Aug-26-2024, 04:52 PM
Last Post: Calab
  Pandas dataframes and numpy arrays bytecrunch 1 2,072 Oct-11-2022, 08:08 PM
Last Post: Larz60+
  Merging sorted dataframes using Pandas Robotguy 1 2,917 Aug-12-2020, 07:11 PM
Last Post: jefsummers
  Merging two DataFrames based on indexes from two other DataFrames lucinda_rigeitti 0 2,316 Jan-16-2020, 08:36 PM
Last Post: lucinda_rigeitti
  Why can't I merge pandas dataframes learnpython2018 2 9,173 Sep-23-2018, 05:53 PM
Last Post: learnpython2018

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020