Python Forum
How to split string - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: How to split string (/thread-10738.html)



How to split string - SriRajesh - Jun-04-2018

Hi,

I have below dataFrame and I want search and take the numeric value left side to specified substring.

My data:
df2 = pd.DataFramedf = pd.DataFrame({'ids': ['AA-120amp', 'BA+250A-52amp', 'AA-5623amp','CD']})

df2[['df2','rank']] = df2['ids'].str.split('-',expand=True)

print (df2: (output)
            ids      df2   allele     rank
0      AA-120amp       AA   120amp   120amp
1  BA+250A-52amp  BA+250A    52amp    52amp
2     AA-5623amp       AA  5623amp  5623amp
3             CD       CD     None     None

my desired output is:

            ids      df2   allele     rank
0      AA-120amp       AA   120amp   120
1  BA+250A-52amp  BA+250A    52amp   52
2     AA-5623amp       AA  5623amp   5623
I want to split the column at amp, and print numerical value left side to amp. If no amp exists in any row, just print None.
I tried above, but I could not be able to get in a single line what I want.


RE: How to split string - volcano63 - Jun-04-2018

That was an interesting challenge.

First of all, you need a RegEx - and for that findall may work, but there's a better option in the form of extractall method that returns DataFrame with column for each RegEx group.

The next problem - it's multi-index, so you have to drop one level. The final step - instead of assigning columns, just merge the extracted data with the original DataFrame

The bottom line - in one line Dance (pun intended Tongue ) it looks convoluted, but it is doable.

df2 = pd.DataFramedf = pd.DataFrame({'ids': ['AA-120amp', 'BA+250A-52amp', 'AA-5623amp','CD']})
df2 = df2.merge(df2['ids'].str.extractall(r'(?P<df2>[^-]+)-(?P<allele>(?P<rank>\d+).+)')
                .set_index(res.index.droplevel(1)), 
                'inner', left_index=True, right_index=True)
And the result
Output:
ids df2 allele rank 0 AA-120amp AA 120amp 120 1 BA+250A-52amp BA+250A 52amp 52 2 AA-5623amp AA 5623amp 5623



RE: How to split string - SriRajesh - Jun-11-2018

I encounetr errr:
NameError Traceback (most recent call last)
<ipython-input-4-e50857fb9961> in <module>()
3 df2 = pd.DataFramedf = pd.DataFrame({'ids': ['AA-120amp', 'BA+250A-52amp', 'AA-5623amp','CD']})
4 df2 = df2.merge(df2['ids'].str.extractall(r'(?P<df2>[^-]+)-(?P<allele>(?P<rank>\d+).+)')
----> 5 .set_index(res.index.droplevel(1)),'inner', left_index=True, right_index=True)
6

NameError: name 'res' is not defined


RE: How to split string - volcano63 - Jun-11-2018

(Jun-11-2018, 11:50 AM)SriRajesh Wrote: I encounetr errr:
NameError Traceback (most recent call last)
<ipython-input-4-e50857fb9961> in <module>()
3 df2 = pd.DataFramedf = pd.DataFrame({'ids': ['AA-120amp', 'BA+250A-52amp', 'AA-5623amp','CD']})
4 df2 = df2.merge(df2['ids'].str.extractall(r'(?P<df2>[^-]+)-(?P<allele>(?P<rank>\d+).+)')
----> 5 .set_index(res.index.droplevel(1)),'inner', left_index=True, right_index=True)
6

NameError: name 'res' is not defined

OK, sorry - wrong merge of code. 3 strings
df2 = pd.DataFramedf = pd.DataFrame({'ids': ['AA-120amp', 'BA+250A-52amp', 'AA-5623amp','CD']})
extracted = df2['ids'].str.extractall(r'(?P<df2>[^-]+)-(?P<allele>(?P<rank>\d+).+)')
df2 = df2.merge(extracted.set_index(extracted.index.droplevel(1)), 
                'inner', left_index=True, right_index=True)
The result is
Output:
ids df2 allele rank 0 AA-120amp AA 120amp 120 1 BA+250A-52amp BA+250A 52amp 52 2 AA-5623amp AA 5623amp 5623



RE: How to split string - SriRajesh - Jun-11-2018

it, works, Many many thanks,

But Sir can it be possible to explain the main tricky(logic) I just want to learn for future handling?


RE: How to split string - volcano63 - Jun-11-2018

(Jun-11-2018, 01:00 PM)SriRajesh Wrote: But Sir can it be possible to explain the main tricky(logic) I just want to learn for future handling?

Just read the docs. Python RE, pandas merge and extractall.

I did not have the answer - I was curious enough to try it out and learn in the process. I use free Azure Notebook to experiment (not only it, but this is one of my favorite).