Python Forum
How to extract data between two strings
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to extract data between two strings
#1
Hi,
I have input data as below,
input.xlsx:

Group	 Name	                  Rank
Group1	 ABC_YJK_02_S_2019-08-01	2
	     ABC_YMK_5_S_2019-08-01	    5
	     ABC_JKL_S_2019-08-04	    10
Group2	 BCA_POL_S_2019-08-01	    4
	     BCA_PAL_S_2019-08-01	    5
	     BCA_TYP_S_2019-08-01	    50
	     BCA_DIST_S_2019-08-01	    23
	     BCA_STA_S_2019-08-01	    3
I name column, I only want to delete everything before the first _(underline) including underline, and remove everything after _S including _S

Then I want to write the output into out.xlsx

I use the below code, but it is not working. giving below error:
TypeError: expected string or bytes-like object



import pandas as pd
import re

df = pd.read_excel('D:\pivotdata2.xlsx',sheetname='merge')
#df['Group']=df['Group'].fillna(method='ffill')

df.to_excel('D:\writepivotdata.xlsx',index=False) 

result=[]
for index, row in df.iterrows():
    result_tmp=re.search('_ (.*?)_',row)
    result.append(result_tmp)
Reply
#2
you can use Regular Expressions (regex), but I'm not as skilled as I would like to be.

Alternatively you can use the following
line = "Group1   ABC_YJK_02_S_2019-08-01    2"
beginning, end = line.find('_'), line.find('_S_')
result = line[beginning+1 : end]
print(result)
or

table = [
"Group1   ABC_YJK_02_S_2019-08-01    2",
"         ABC_YMK_5_S_2019-08-01     5",
"         ABC_JKL_S_2019-08-04       10",
"Group2   BCA_POL_S_2019-08-01       4",
"         BCA_PAL_S_2019-08-01       5",
"         BCA_TYP_S_2019-08-01       50",
"         BCA_DIST_S_2019-08-01      23",
"         BCA_STA_S_2019-08-01       3"]

n = len(table)

resultsTable = []

for i in range(n):
    beginning, end = table[i].find('_'), table[i].find('_S_')
    result = table[i][beginning+1 : end]
    resultsTable.append(result)
Paul
Reply
#3
I want to replace resultsTable with

Group    Name      Rank
Group1   YJK_02    2
         YMK_5     5
         JKL       10
Group2   POL       4
         PAL       5
         TYP       50
         DIST      23
         STA       3
I use below code:
table = [
"Group1   ABC_YJK_02_S_2019-08-01    2",
"         ABC_YMK_5_S_2019-08-01     5",
"         ABC_JKL_S_2019-08-04       10",
"Group2   BCA_POL_S_2019-08-01       4",
"         BCA_PAL_S_2019-08-01       5",
"         BCA_TYP_S_2019-08-01       50",
"         BCA_DIST_S_2019-08-01      23",
"         BCA_STA_S_2019-08-01       3"]
 
n = len(table)
 
resultsTable = []
 
for i in range(n):
    beginning, end = table[i].find('_'), table[i].find('_S_')
    result = table[i][beginning+1 : end]
    resultsTable.append(result)
import pandas as pd
resultsTable['final_name']=pd.DataFrame(resultsTable)
writer = pd.ExcelWriter('output.xlsx')
df.to_excel(writer,'Sheet1')
df.to_excel(writer,'Sheet2')
writer.save()
Getting below error:
TypeError: list indices must be integers or slices, not str
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Training a model to identify specific SMS types and extract relevant data? lord_of_cinder 0 978 Oct-10-2022, 04:35 AM
Last Post: lord_of_cinder
  extract and plot data from a txt file usercat123 2 1,230 Apr-20-2022, 06:50 PM
Last Post: usercat123
  How to extract data from paragraph using Machine Learning with python? bccsthilina 2 3,043 Jul-27-2020, 07:02 AM
Last Post: hussainmujtaba
  Filter rows by multiple text conditions in another data frame i.e contains strings an Pan 0 2,159 Jun-09-2020, 06:05 AM
Last Post: Pan
  how to extract financial data from photocopy of document angela1 6 3,680 Feb-15-2020, 05:50 PM
Last Post: jim2007
  How to extract different data groups from multiple CSV files using python Rafiz 3 3,242 Jun-04-2019, 05:20 PM
Last Post: jefsummers
  Extract data between two dates from a .csv file using Python 2.7 sujai_banerji 1 10,368 Nov-15-2017, 09:48 PM
Last Post: snippsat
  I'm working onn below code to extract data from excel using python kiran 1 3,279 Oct-24-2017, 01:42 PM
Last Post: kiran

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020