Nov-25-2020, 08:08 AM
Hi all
I am new in the Python world (20 years ago I did some C/C++). For being new, I was able to achieve quite a lot so far. I successfully got around RegEx all these years, but this seems to change now...
With this problem I didn't get a solution so far, I also think so far I have not fully understood the indexing/selecting mechanism.
I have a data frame 'data_total' (what a thrilling name...) with the column INFO. It contains strings like 'X-Z-34567A' or 'X-Y-123456'.
I'd like to extract the numbers into a new column INFO_NR. The letter on the tail is to replace with a '0'.
After all, data should read '345670' and '123456'
First I tried a slightly other way: I extracted the number part, converted it to int and multiplied by 10.
See the following code snippet:
Both the regexes work, but the second deletes the results of the first. How can I apply the second regex only on the rows with INFO_NR == 0, without deleting the first results?
And how I got to know Python so far, there should be a much more elegant solution out there
Looking forward to your inputs
Thank you
Stephan
I am new in the Python world (20 years ago I did some C/C++). For being new, I was able to achieve quite a lot so far. I successfully got around RegEx all these years, but this seems to change now...
With this problem I didn't get a solution so far, I also think so far I have not fully understood the indexing/selecting mechanism.
I have a data frame 'data_total' (what a thrilling name...) with the column INFO. It contains strings like 'X-Z-34567A' or 'X-Y-123456'.
I'd like to extract the numbers into a new column INFO_NR. The letter on the tail is to replace with a '0'.
After all, data should read '345670' and '123456'
First I tried a slightly other way: I extracted the number part, converted it to int and multiplied by 10.
See the following code snippet:
1 2 3 4 5 6 7 8 |
# this processes the X-Z-34567A correctly, fills the fields of the other rows with nan data_total[ 'INFO_NR' ] = data_total[ 'INFO' ]. str .extract( '^X-\w-(\d*)[ABCDEFGHILKMNOPQRSTUVWXYZ]$' , expand = False ). str .strip() data_total[ 'INFO_NR' ] = data_total[ 'INFO_NR' ].fillna( '0' ) data_total[ 'INFO_NR' ] = data_total[ 'INFO_NR' ].astype(np.int64) * 10 # this processes the X-Y-123456 correctly, but fills the previously processed fields with nan!! data_total[ 'INFO_NR' ] = data_total[ 'INFO' ]. str .extract( '^X-\w-(\d*)$' , expand = False ). str .strip() data_total[ 'INFO_NR' ] = data_total[ 'INFO_NR' ].astype(np.int64) * 10 |
And how I got to know Python so far, there should be a much more elegant solution out there

Looking forward to your inputs
Thank you
Stephan