Python Forum
Thread Rating:
  • 1 Vote(s) - 2 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to split string
#1
Hi,

I have below dataFrame and I want search and take the numeric value left side to specified substring.

My data:
df2 = pd.DataFramedf = pd.DataFrame({'ids': ['AA-120amp', 'BA+250A-52amp', 'AA-5623amp','CD']})

df2[['df2','rank']] = df2['ids'].str.split('-',expand=True)

print (df2: (output)
            ids      df2   allele     rank
0      AA-120amp       AA   120amp   120amp
1  BA+250A-52amp  BA+250A    52amp    52amp
2     AA-5623amp       AA  5623amp  5623amp
3             CD       CD     None     None

my desired output is:

            ids      df2   allele     rank
0      AA-120amp       AA   120amp   120
1  BA+250A-52amp  BA+250A    52amp   52
2     AA-5623amp       AA  5623amp   5623
I want to split the column at amp, and print numerical value left side to amp. If no amp exists in any row, just print None.
I tried above, but I could not be able to get in a single line what I want.
Reply
#2
That was an interesting challenge.

First of all, you need a RegEx - and for that findall may work, but there's a better option in the form of extractall method that returns DataFrame with column for each RegEx group.

The next problem - it's multi-index, so you have to drop one level. The final step - instead of assigning columns, just merge the extracted data with the original DataFrame

The bottom line - in one line Dance (pun intended Tongue ) it looks convoluted, but it is doable.

df2 = pd.DataFramedf = pd.DataFrame({'ids': ['AA-120amp', 'BA+250A-52amp', 'AA-5623amp','CD']})
df2 = df2.merge(df2['ids'].str.extractall(r'(?P<df2>[^-]+)-(?P<allele>(?P<rank>\d+).+)')
                .set_index(res.index.droplevel(1)), 
                'inner', left_index=True, right_index=True)
And the result
Output:
ids df2 allele rank 0 AA-120amp AA 120amp 120 1 BA+250A-52amp BA+250A 52amp 52 2 AA-5623amp AA 5623amp 5623
Test everything in a Python shell (iPython, Azure Notebook, etc.)
  • Someone gave you an advice you liked? Test it - maybe the advice was actually bad.
  • Someone gave you an advice you think is bad? Test it before arguing - maybe it was good.
  • You posted a claim that something you did not test works? Be prepared to eat your hat.
Reply
#3
I encounetr errr:
NameError Traceback (most recent call last)
<ipython-input-4-e50857fb9961> in <module>()
3 df2 = pd.DataFramedf = pd.DataFrame({'ids': ['AA-120amp', 'BA+250A-52amp', 'AA-5623amp','CD']})
4 df2 = df2.merge(df2['ids'].str.extractall(r'(?P<df2>[^-]+)-(?P<allele>(?P<rank>\d+).+)')
----> 5 .set_index(res.index.droplevel(1)),'inner', left_index=True, right_index=True)
6

NameError: name 'res' is not defined
Reply
#4
(Jun-11-2018, 11:50 AM)SriRajesh Wrote: I encounetr errr:
NameError Traceback (most recent call last)
<ipython-input-4-e50857fb9961> in <module>()
3 df2 = pd.DataFramedf = pd.DataFrame({'ids': ['AA-120amp', 'BA+250A-52amp', 'AA-5623amp','CD']})
4 df2 = df2.merge(df2['ids'].str.extractall(r'(?P<df2>[^-]+)-(?P<allele>(?P<rank>\d+).+)')
----> 5 .set_index(res.index.droplevel(1)),'inner', left_index=True, right_index=True)
6

NameError: name 'res' is not defined

OK, sorry - wrong merge of code. 3 strings
df2 = pd.DataFramedf = pd.DataFrame({'ids': ['AA-120amp', 'BA+250A-52amp', 'AA-5623amp','CD']})
extracted = df2['ids'].str.extractall(r'(?P<df2>[^-]+)-(?P<allele>(?P<rank>\d+).+)')
df2 = df2.merge(extracted.set_index(extracted.index.droplevel(1)), 
                'inner', left_index=True, right_index=True)
The result is
Output:
ids df2 allele rank 0 AA-120amp AA 120amp 120 1 BA+250A-52amp BA+250A 52amp 52 2 AA-5623amp AA 5623amp 5623
Test everything in a Python shell (iPython, Azure Notebook, etc.)
  • Someone gave you an advice you liked? Test it - maybe the advice was actually bad.
  • Someone gave you an advice you think is bad? Test it before arguing - maybe it was good.
  • You posted a claim that something you did not test works? Be prepared to eat your hat.
Reply
#5
it, works, Many many thanks,

But Sir can it be possible to explain the main tricky(logic) I just want to learn for future handling?
Reply
#6
(Jun-11-2018, 01:00 PM)SriRajesh Wrote: But Sir can it be possible to explain the main tricky(logic) I just want to learn for future handling?

Just read the docs. Python RE, pandas merge and extractall.

I did not have the answer - I was curious enough to try it out and learn in the process. I use free Azure Notebook to experiment (not only it, but this is one of my favorite).
Test everything in a Python shell (iPython, Azure Notebook, etc.)
  • Someone gave you an advice you liked? Test it - maybe the advice was actually bad.
  • Someone gave you an advice you think is bad? Test it before arguing - maybe it was good.
  • You posted a claim that something you did not test works? Be prepared to eat your hat.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Split string hext01 3 2,880 Aug-16-2018, 09:51 AM
Last Post: hext01

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020