Nov-25-2020, 08:08 AM
Hi all
I am new in the Python world (20 years ago I did some C/C++). For being new, I was able to achieve quite a lot so far. I successfully got around RegEx all these years, but this seems to change now...
With this problem I didn't get a solution so far, I also think so far I have not fully understood the indexing/selecting mechanism.
I have a data frame 'data_total' (what a thrilling name...) with the column INFO. It contains strings like 'X-Z-34567A' or 'X-Y-123456'.
I'd like to extract the numbers into a new column INFO_NR. The letter on the tail is to replace with a '0'.
After all, data should read '345670' and '123456'
First I tried a slightly other way: I extracted the number part, converted it to int and multiplied by 10.
See the following code snippet:
And how I got to know Python so far, there should be a much more elegant solution out there
Looking forward to your inputs
Thank you
Stephan
I am new in the Python world (20 years ago I did some C/C++). For being new, I was able to achieve quite a lot so far. I successfully got around RegEx all these years, but this seems to change now...
With this problem I didn't get a solution so far, I also think so far I have not fully understood the indexing/selecting mechanism.
I have a data frame 'data_total' (what a thrilling name...) with the column INFO. It contains strings like 'X-Z-34567A' or 'X-Y-123456'.
I'd like to extract the numbers into a new column INFO_NR. The letter on the tail is to replace with a '0'.
After all, data should read '345670' and '123456'
First I tried a slightly other way: I extracted the number part, converted it to int and multiplied by 10.
See the following code snippet:
# this processes the X-Z-34567A correctly, fills the fields of the other rows with nan data_total['INFO_NR'] = data_total['INFO'].str.extract('^X-\w-(\d*)[ABCDEFGHILKMNOPQRSTUVWXYZ]$', expand=False).str.strip() data_total['INFO_NR'] = data_total['INFO_NR'].fillna('0') data_total['INFO_NR'] = data_total['INFO_NR'].astype(np.int64)*10 # this processes the X-Y-123456 correctly, but fills the previously processed fields with nan!! data_total['INFO_NR'] = data_total['INFO'].str.extract('^X-\w-(\d*)$', expand=False).str.strip() data_total['INFO_NR'] = data_total['INFO_NR'].astype(np.int64)*10Both the regexes work, but the second deletes the results of the first. How can I apply the second regex only on the rows with INFO_NR == 0, without deleting the first results?
And how I got to know Python so far, there should be a much more elegant solution out there

Looking forward to your inputs
Thank you
Stephan