Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 How to extract data between two strings
#1
Hi,
I have input data as below,
input.xlsx:

Group	 Name	                  Rank
Group1	 ABC_YJK_02_S_2019-08-01	2
	     ABC_YMK_5_S_2019-08-01	    5
	     ABC_JKL_S_2019-08-04	    10
Group2	 BCA_POL_S_2019-08-01	    4
	     BCA_PAL_S_2019-08-01	    5
	     BCA_TYP_S_2019-08-01	    50
	     BCA_DIST_S_2019-08-01	    23
	     BCA_STA_S_2019-08-01	    3

I name column, I only want to delete everything before the first _(underline) including underline, and remove everything after _S including _S

Then I want to write the output into out.xlsx

I use the below code, but it is not working. giving below error:
TypeError: expected string or bytes-like object



import pandas as pd
import re

df = pd.read_excel('D:\pivotdata2.xlsx',sheetname='merge')
#df['Group']=df['Group'].fillna(method='ffill')

df.to_excel('D:\writepivotdata.xlsx',index=False) 

result=[]
for index, row in df.iterrows():
    result_tmp=re.search('_ (.*?)_',row)
    result.append(result_tmp)
Quote
#2
you can use Regular Expressions (regex), but I'm not as skilled as I would like to be.

Alternatively you can use the following
line = "Group1   ABC_YJK_02_S_2019-08-01    2"
beginning, end = line.find('_'), line.find('_S_')
result = line[beginning+1 : end]
print(result)
or

table = [
"Group1   ABC_YJK_02_S_2019-08-01    2",
"         ABC_YMK_5_S_2019-08-01     5",
"         ABC_JKL_S_2019-08-04       10",
"Group2   BCA_POL_S_2019-08-01       4",
"         BCA_PAL_S_2019-08-01       5",
"         BCA_TYP_S_2019-08-01       50",
"         BCA_DIST_S_2019-08-01      23",
"         BCA_STA_S_2019-08-01       3"]

n = len(table)

resultsTable = []

for i in range(n):
    beginning, end = table[i].find('_'), table[i].find('_S_')
    result = table[i][beginning+1 : end]
    resultsTable.append(result)
Paul
Quote
#3
I want to replace resultsTable with

Group    Name      Rank
Group1   YJK_02    2
         YMK_5     5
         JKL       10
Group2   POL       4
         PAL       5
         TYP       50
         DIST      23
         STA       3
I use below code:
table = [
"Group1   ABC_YJK_02_S_2019-08-01    2",
"         ABC_YMK_5_S_2019-08-01     5",
"         ABC_JKL_S_2019-08-04       10",
"Group2   BCA_POL_S_2019-08-01       4",
"         BCA_PAL_S_2019-08-01       5",
"         BCA_TYP_S_2019-08-01       50",
"         BCA_DIST_S_2019-08-01      23",
"         BCA_STA_S_2019-08-01       3"]
 
n = len(table)
 
resultsTable = []
 
for i in range(n):
    beginning, end = table[i].find('_'), table[i].find('_S_')
    result = table[i][beginning+1 : end]
    resultsTable.append(result)
import pandas as pd
resultsTable['final_name']=pd.DataFrame(resultsTable)
writer = pd.ExcelWriter('output.xlsx')
df.to_excel(writer,'Sheet1')
df.to_excel(writer,'Sheet2')
writer.save()
Getting below error:
TypeError: list indices must be integers or slices, not str
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  How to extract different data groups from multiple CSV files using python Rafiz 3 412 Jun-04-2019, 05:20 PM
Last Post: jefsummers
  Extract data between two dates from a .csv file using Python 2.7 sujai_banerji 1 4,629 Nov-15-2017, 09:48 PM
Last Post: snippsat
  I'm working onn below code to extract data from excel using python kiran 1 1,134 Oct-24-2017, 01:42 PM
Last Post: kiran

Forum Jump:


Users browsing this thread: 1 Guest(s)