Hi,
I have below dataFrame and I want search and take the numeric value left side to specified substring.
My data:
df2 = pd.DataFramedf = pd.DataFrame({'ids': ['AA-120amp', 'BA+250A-52amp', 'AA-5623amp','CD']})
df2[['df2','rank']] = df2['ids'].str.split('-',expand=True)
print (df2: (output)
ids df2 allele rank
0 AA-120amp AA 120amp 120amp
1 BA+250A-52amp BA+250A 52amp 52amp
2 AA-5623amp AA 5623amp 5623amp
3 CD CD None None
my desired output is:
ids df2 allele rank
0 AA-120amp AA 120amp 120
1 BA+250A-52amp BA+250A 52amp 52
2 AA-5623amp AA 5623amp 5623
I want to split the column at amp, and print numerical value left side to amp. If no amp exists in any row, just print None.
I tried above, but I could not be able to get in a single line what I want.
That was an interesting challenge.
First of all, you need a RegEx - and for that
findall
may work, but there's a better option in the form of
extractall
method that returns DataFrame with column for each RegEx group.
The next problem - it's multi-index, so you have to drop one level. The final step - instead of assigning columns, just merge the extracted data with the original
DataFrame
The bottom line - in one line

(pun intended

) it looks convoluted, but it is doable.
df2 = pd.DataFramedf = pd.DataFrame({'ids': ['AA-120amp', 'BA+250A-52amp', 'AA-5623amp','CD']})
df2 = df2.merge(df2['ids'].str.extractall(r'(?P<df2>[^-]+)-(?P<allele>(?P<rank>\d+).+)')
.set_index(res.index.droplevel(1)),
'inner', left_index=True, right_index=True)
And the result
Output:
ids df2 allele rank
0 AA-120amp AA 120amp 120
1 BA+250A-52amp BA+250A 52amp 52
2 AA-5623amp AA 5623amp 5623
I encounetr errr:
NameError Traceback (most recent call last)
<ipython-input-4-e50857fb9961> in <module>()
3 df2 = pd.DataFramedf = pd.DataFrame({'ids': ['AA-120amp', 'BA+250A-52amp', 'AA-5623amp','CD']})
4 df2 = df2.merge(df2['ids'].str.extractall(r'(?P<df2>[^-]+)-(?P<allele>(?P<rank>\d+).+)')
----> 5 .set_index(res.index.droplevel(1)),'inner', left_index=True, right_index=True)
6
NameError: name 'res' is not defined
(Jun-11-2018, 11:50 AM)SriRajesh Wrote: [ -> ]I encounetr errr:
NameError Traceback (most recent call last)
<ipython-input-4-e50857fb9961> in <module>()
3 df2 = pd.DataFramedf = pd.DataFrame({'ids': ['AA-120amp', 'BA+250A-52amp', 'AA-5623amp','CD']})
4 df2 = df2.merge(df2['ids'].str.extractall(r'(?P<df2>[^-]+)-(?P<allele>(?P<rank>\d+).+)')
----> 5 .set_index(res.index.droplevel(1)),'inner', left_index=True, right_index=True)
6
NameError: name 'res' is not defined
OK, sorry - wrong merge of code. 3 strings
df2 = pd.DataFramedf = pd.DataFrame({'ids': ['AA-120amp', 'BA+250A-52amp', 'AA-5623amp','CD']})
extracted = df2['ids'].str.extractall(r'(?P<df2>[^-]+)-(?P<allele>(?P<rank>\d+).+)')
df2 = df2.merge(extracted.set_index(extracted.index.droplevel(1)),
'inner', left_index=True, right_index=True)
The result is
Output:
ids df2 allele rank
0 AA-120amp AA 120amp 120
1 BA+250A-52amp BA+250A 52amp 52
2 AA-5623amp AA 5623amp 5623
it, works, Many many thanks,
But Sir can it be possible to explain the main tricky(logic) I just want to learn for future handling?
(Jun-11-2018, 01:00 PM)SriRajesh Wrote: [ -> ]But Sir can it be possible to explain the main tricky(logic) I just want to learn for future handling?
Just read the docs. Python RE, pandas
merge
and
extractall
.
I did not have the answer - I was curious enough to try it out and learn in the process. I use
free Azure Notebook to experiment (not only it, but this is one of my favorite).