How to extract data between two strings

SriMekala · Aug-08-2019, 04:37 AM

Hi,
I have input data as below,
input.xlsx:

Group	 Name	                  Rank
Group1	 ABC_YJK_02_S_2019-08-01	2
	     ABC_YMK_5_S_2019-08-01	    5
	     ABC_JKL_S_2019-08-04	    10
Group2	 BCA_POL_S_2019-08-01	    4
	     BCA_PAL_S_2019-08-01	    5
	     BCA_TYP_S_2019-08-01	    50
	     BCA_DIST_S_2019-08-01	    23
	     BCA_STA_S_2019-08-01	    3

I name column, I only want to delete everything before the first _(underline) including underline, and remove everything after _S including _S

Then I want to write the output into out.xlsx

I use the below code, but it is not working. giving below error:
TypeError: expected string or bytes-like object

import pandas as pd
import re

df = pd.read_excel('D:\pivotdata2.xlsx',sheetname='merge')
#df['Group']=df['Group'].fillna(method='ffill')

df.to_excel('D:\writepivotdata.xlsx',index=False) 

result=[]
for index, row in df.iterrows():
    result_tmp=re.search('_ (.*?)_',row)
    result.append(result_tmp)

paul18fr · (This post was last modified: Aug-08-2019, 08:38 AM by paul18fr.)

you can use Regular Expressions (regex), but I'm not as skilled as I would like to be.

Alternatively you can use the following

line = "Group1   ABC_YJK_02_S_2019-08-01    2"
beginning, end = line.find('_'), line.find('_S_')
result = line[beginning+1 : end]
print(result)

or

table = [
"Group1   ABC_YJK_02_S_2019-08-01    2",
"         ABC_YMK_5_S_2019-08-01     5",
"         ABC_JKL_S_2019-08-04       10",
"Group2   BCA_POL_S_2019-08-01       4",
"         BCA_PAL_S_2019-08-01       5",
"         BCA_TYP_S_2019-08-01       50",
"         BCA_DIST_S_2019-08-01      23",
"         BCA_STA_S_2019-08-01       3"]

n = len(table)

resultsTable = []

for i in range(n):
    beginning, end = table[i].find('_'), table[i].find('_S_')
    result = table[i][beginning+1 : end]
    resultsTable.append(result)

Paul

SriMekala · Aug-08-2019, 01:54 PM

I want to replace resultsTable with

Group    Name      Rank
Group1   YJK_02    2
         YMK_5     5
         JKL       10
Group2   POL       4
         PAL       5
         TYP       50
         DIST      23
         STA       3

I use below code:

table = [
"Group1   ABC_YJK_02_S_2019-08-01    2",
"         ABC_YMK_5_S_2019-08-01     5",
"         ABC_JKL_S_2019-08-04       10",
"Group2   BCA_POL_S_2019-08-01       4",
"         BCA_PAL_S_2019-08-01       5",
"         BCA_TYP_S_2019-08-01       50",
"         BCA_DIST_S_2019-08-01      23",
"         BCA_STA_S_2019-08-01       3"]
 
n = len(table)
 
resultsTable = []
 
for i in range(n):
    beginning, end = table[i].find('_'), table[i].find('_S_')
    result = table[i][beginning+1 : end]
    resultsTable.append(result)
import pandas as pd
resultsTable['final_name']=pd.DataFrame(resultsTable)
writer = pd.ExcelWriter('output.xlsx')
df.to_excel(writer,'Sheet1')
df.to_excel(writer,'Sheet2')
writer.save()

Getting below error:
TypeError: list indices must be integers or slices, not str

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Training a model to identify specific SMS types and extract relevant data?	lord_of_cinder	0	975	Oct-10-2022, 04:35 AM Last Post: lord_of_cinder
	extract and plot data from a txt file	usercat123	2	1,230	Apr-20-2022, 06:50 PM Last Post: usercat123
	How to extract data from paragraph using Machine Learning with python?	bccsthilina	2	3,036	Jul-27-2020, 07:02 AM Last Post: hussainmujtaba
	Filter rows by multiple text conditions in another data frame i.e contains strings an	Pan	0	2,155	Jun-09-2020, 06:05 AM Last Post: Pan
	how to extract financial data from photocopy of document	angela1	6	3,679	Feb-15-2020, 05:50 PM Last Post: jim2007
	How to extract different data groups from multiple CSV files using python	Rafiz	3	3,239	Jun-04-2019, 05:20 PM Last Post: jefsummers
	Extract data between two dates from a .csv file using Python 2.7	sujai_banerji	1	10,365	Nov-15-2017, 09:48 PM Last Post: snippsat
	I'm working onn below code to extract data from excel using python	kiran	1	3,278	Oct-24-2017, 01:42 PM Last Post: kiran

How to extract data between two strings

User Panel Messages

Announcements