How to extract data between two strings - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: How to extract data between two strings (/thread-20381.html) |
How to extract data between two strings - SriMekala - Aug-08-2019 Hi, I have input data as below, input.xlsx: Group Name Rank Group1 ABC_YJK_02_S_2019-08-01 2 ABC_YMK_5_S_2019-08-01 5 ABC_JKL_S_2019-08-04 10 Group2 BCA_POL_S_2019-08-01 4 BCA_PAL_S_2019-08-01 5 BCA_TYP_S_2019-08-01 50 BCA_DIST_S_2019-08-01 23 BCA_STA_S_2019-08-01 3I name column, I only want to delete everything before the first _(underline) including underline, and remove everything after _S including _S Then I want to write the output into out.xlsx I use the below code, but it is not working. giving below error: TypeError: expected string or bytes-like object import pandas as pd import re df = pd.read_excel('D:\pivotdata2.xlsx',sheetname='merge') #df['Group']=df['Group'].fillna(method='ffill') df.to_excel('D:\writepivotdata.xlsx',index=False) result=[] for index, row in df.iterrows(): result_tmp=re.search('_ (.*?)_',row) result.append(result_tmp) RE: How to extract data between two strings - paul18fr - Aug-08-2019 you can use Regular Expressions (regex), but I'm not as skilled as I would like to be. Alternatively you can use the following line = "Group1 ABC_YJK_02_S_2019-08-01 2" beginning, end = line.find('_'), line.find('_S_') result = line[beginning+1 : end] print(result)or table = [ "Group1 ABC_YJK_02_S_2019-08-01 2", " ABC_YMK_5_S_2019-08-01 5", " ABC_JKL_S_2019-08-04 10", "Group2 BCA_POL_S_2019-08-01 4", " BCA_PAL_S_2019-08-01 5", " BCA_TYP_S_2019-08-01 50", " BCA_DIST_S_2019-08-01 23", " BCA_STA_S_2019-08-01 3"] n = len(table) resultsTable = [] for i in range(n): beginning, end = table[i].find('_'), table[i].find('_S_') result = table[i][beginning+1 : end] resultsTable.append(result)Paul RE: How to extract data between two strings - SriMekala - Aug-08-2019 I want to replace resultsTable with Group Name Rank Group1 YJK_02 2 YMK_5 5 JKL 10 Group2 POL 4 PAL 5 TYP 50 DIST 23 STA 3I use below code: table = [ "Group1 ABC_YJK_02_S_2019-08-01 2", " ABC_YMK_5_S_2019-08-01 5", " ABC_JKL_S_2019-08-04 10", "Group2 BCA_POL_S_2019-08-01 4", " BCA_PAL_S_2019-08-01 5", " BCA_TYP_S_2019-08-01 50", " BCA_DIST_S_2019-08-01 23", " BCA_STA_S_2019-08-01 3"] n = len(table) resultsTable = [] for i in range(n): beginning, end = table[i].find('_'), table[i].find('_S_') result = table[i][beginning+1 : end] resultsTable.append(result) import pandas as pd resultsTable['final_name']=pd.DataFrame(resultsTable) writer = pd.ExcelWriter('output.xlsx') df.to_excel(writer,'Sheet1') df.to_excel(writer,'Sheet2') writer.save()Getting below error: TypeError: list indices must be integers or slices, not str |