How to extract data between two strings

How to extract data between two strings - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: How to extract data between two strings (/thread-20381.html)

How to extract data between two strings - SriMekala - Aug-08-2019

Hi,
I have input data as below,
input.xlsx:

Group	 Name	                  Rank
Group1	 ABC_YJK_02_S_2019-08-01	2
	     ABC_YMK_5_S_2019-08-01	    5
	     ABC_JKL_S_2019-08-04	    10
Group2	 BCA_POL_S_2019-08-01	    4
	     BCA_PAL_S_2019-08-01	    5
	     BCA_TYP_S_2019-08-01	    50
	     BCA_DIST_S_2019-08-01	    23
	     BCA_STA_S_2019-08-01	    3

I name column, I only want to delete everything before the first _(underline) including underline, and remove everything after _S including _S

Then I want to write the output into out.xlsx

I use the below code, but it is not working. giving below error:
TypeError: expected string or bytes-like object

import pandas as pd
import re

df = pd.read_excel('D:\pivotdata2.xlsx',sheetname='merge')
#df['Group']=df['Group'].fillna(method='ffill')

df.to_excel('D:\writepivotdata.xlsx',index=False) 

result=[]
for index, row in df.iterrows():
    result_tmp=re.search('_ (.*?)_',row)
    result.append(result_tmp)

RE: How to extract data between two strings - paul18fr - Aug-08-2019

you can use Regular Expressions (regex), but I'm not as skilled as I would like to be.

Alternatively you can use the following

line = "Group1   ABC_YJK_02_S_2019-08-01    2"
beginning, end = line.find('_'), line.find('_S_')
result = line[beginning+1 : end]
print(result)

table = [
"Group1   ABC_YJK_02_S_2019-08-01    2",
"         ABC_YMK_5_S_2019-08-01     5",
"         ABC_JKL_S_2019-08-04       10",
"Group2   BCA_POL_S_2019-08-01       4",
"         BCA_PAL_S_2019-08-01       5",
"         BCA_TYP_S_2019-08-01       50",
"         BCA_DIST_S_2019-08-01      23",
"         BCA_STA_S_2019-08-01       3"]

n = len(table)

resultsTable = []

for i in range(n):
    beginning, end = table[i].find('_'), table[i].find('_S_')
    result = table[i][beginning+1 : end]
    resultsTable.append(result)

Paul

RE: How to extract data between two strings - SriMekala - Aug-08-2019

I want to replace resultsTable with

Group    Name      Rank
Group1   YJK_02    2
         YMK_5     5
         JKL       10
Group2   POL       4
         PAL       5
         TYP       50
         DIST      23
         STA       3

I use below code:

table = [
"Group1   ABC_YJK_02_S_2019-08-01    2",
"         ABC_YMK_5_S_2019-08-01     5",
"         ABC_JKL_S_2019-08-04       10",
"Group2   BCA_POL_S_2019-08-01       4",
"         BCA_PAL_S_2019-08-01       5",
"         BCA_TYP_S_2019-08-01       50",
"         BCA_DIST_S_2019-08-01      23",
"         BCA_STA_S_2019-08-01       3"]
 
n = len(table)
 
resultsTable = []
 
for i in range(n):
    beginning, end = table[i].find('_'), table[i].find('_S_')
    result = table[i][beginning+1 : end]
    resultsTable.append(result)
import pandas as pd
resultsTable['final_name']=pd.DataFrame(resultsTable)
writer = pd.ExcelWriter('output.xlsx')
df.to_excel(writer,'Sheet1')
df.to_excel(writer,'Sheet2')
writer.save()

Getting below error:
TypeError: list indices must be integers or slices, not str