How to split string - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: How to split string (/thread-10738.html) |
How to split string - SriRajesh - Jun-04-2018 Hi, I have below dataFrame and I want search and take the numeric value left side to specified substring. My data: df2 = pd.DataFramedf = pd.DataFrame({'ids': ['AA-120amp', 'BA+250A-52amp', 'AA-5623amp','CD']}) df2[['df2','rank']] = df2['ids'].str.split('-',expand=True) print (df2: (output) ids df2 allele rank 0 AA-120amp AA 120amp 120amp 1 BA+250A-52amp BA+250A 52amp 52amp 2 AA-5623amp AA 5623amp 5623amp 3 CD CD None None my desired output is: ids df2 allele rank 0 AA-120amp AA 120amp 120 1 BA+250A-52amp BA+250A 52amp 52 2 AA-5623amp AA 5623amp 5623I want to split the column at amp, and print numerical value left side to amp. If no amp exists in any row, just print None. I tried above, but I could not be able to get in a single line what I want. RE: How to split string - volcano63 - Jun-04-2018 That was an interesting challenge. First of all, you need a RegEx - and for that findall may work, but there's a better option in the form of extractall method that returns DataFrame with column for each RegEx group. The next problem - it's multi-index, so you have to drop one level. The final step - instead of assigning columns, just merge the extracted data with the original DataFrame The bottom line - in one line (pun intended ) it looks convoluted, but it is doable. df2 = pd.DataFramedf = pd.DataFrame({'ids': ['AA-120amp', 'BA+250A-52amp', 'AA-5623amp','CD']}) df2 = df2.merge(df2['ids'].str.extractall(r'(?P<df2>[^-]+)-(?P<allele>(?P<rank>\d+).+)') .set_index(res.index.droplevel(1)), 'inner', left_index=True, right_index=True)And the result
RE: How to split string - SriRajesh - Jun-11-2018 I encounetr errr: NameError Traceback (most recent call last) <ipython-input-4-e50857fb9961> in <module>() 3 df2 = pd.DataFramedf = pd.DataFrame({'ids': ['AA-120amp', 'BA+250A-52amp', 'AA-5623amp','CD']}) 4 df2 = df2.merge(df2['ids'].str.extractall(r'(?P<df2>[^-]+)-(?P<allele>(?P<rank>\d+).+)') ----> 5 .set_index(res.index.droplevel(1)),'inner', left_index=True, right_index=True) 6 NameError: name 'res' is not defined RE: How to split string - volcano63 - Jun-11-2018 (Jun-11-2018, 11:50 AM)SriRajesh Wrote: I encounetr errr: OK, sorry - wrong merge of code. 3 strings df2 = pd.DataFramedf = pd.DataFrame({'ids': ['AA-120amp', 'BA+250A-52amp', 'AA-5623amp','CD']}) extracted = df2['ids'].str.extractall(r'(?P<df2>[^-]+)-(?P<allele>(?P<rank>\d+).+)') df2 = df2.merge(extracted.set_index(extracted.index.droplevel(1)), 'inner', left_index=True, right_index=True)The result is
RE: How to split string - SriRajesh - Jun-11-2018 it, works, Many many thanks, But Sir can it be possible to explain the main tricky(logic) I just want to learn for future handling? RE: How to split string - volcano63 - Jun-11-2018 (Jun-11-2018, 01:00 PM)SriRajesh Wrote: But Sir can it be possible to explain the main tricky(logic) I just want to learn for future handling? Just read the docs. Python RE, pandas merge and extractall .I did not have the answer - I was curious enough to try it out and learn in the process. I use free Azure Notebook to experiment (not only it, but this is one of my favorite). |