Jun-09-2018, 09:36 AM
Hi everyone.
i am super super new to coding in python (total of 6 hours training), and i am running into a problem with my code. More precisely, i have no clue how to proceed.
Here is the situation:
I have a single folder with about 100+ excel files. Each one contains a list of measurements. All the excel sheets are in identical format (in terms of columns), but vary in number of measurements (indexes). I have created a very rough script to perform the processing I want. But right now i have to manually input the excel sheet name and change the output file name. Doing this for 100+ is rather stupid. So I want to find a way to batch process all excel files in one folder, and save the output file into a different folder.
simplified: how can I make my code "universal" and process every file in the folder, and save the output into a new folder that is created within parent folder.
The link below provides a couple sample data sets
Datasets
Here is my poorly made script that is full of useless codes and not efficient, but it works
p.s. please go easy on me.. i know my coding sucks.. lol
i am super super new to coding in python (total of 6 hours training), and i am running into a problem with my code. More precisely, i have no clue how to proceed.
Here is the situation:
I have a single folder with about 100+ excel files. Each one contains a list of measurements. All the excel sheets are in identical format (in terms of columns), but vary in number of measurements (indexes). I have created a very rough script to perform the processing I want. But right now i have to manually input the excel sheet name and change the output file name. Doing this for 100+ is rather stupid. So I want to find a way to batch process all excel files in one folder, and save the output file into a different folder.
simplified: how can I make my code "universal" and process every file in the folder, and save the output into a new folder that is created within parent folder.
The link below provides a couple sample data sets
Datasets
Here is my poorly made script that is full of useless codes and not efficient, but it works
#!/usr/bin/env python3 # -*- coding: utf-8 -*- """ Created on Fri Jun 8 12:01:07 2018 @author: anthonyeritano """ import pandas as pd import numpy as np #choose excel file File = 'test1.xlsx' xl_file = pd.ExcelFile(File) #Select spreadsheet data Data= xl_file.parse('Sheet1') #"parse" reads a sheet from the Excel file #Select values from Columns Skel = Data['Skeleton ID'] Brnch = Data['Branch length'] Euc = Data['Euclidean distance'] tort= Data['Branch length']/Data['Euclidean distance'] # make DataFrame (new spreadsheet) for the values DF_3 = pd.DataFrame({'Branch Number': Skel, 'Brnchln': Brnch, 'Eucl': Euc, 'Tort': tort}) DF_4 = DF_3[['Branch Number','Brnchln','Eucl','Tort']] DF_4.head() totalNumber = len(DF_4.dropna()) dic = DF_4.to_dict() dic2 = {} for key in dic.keys(): for key2 in dic[key]: dic2[key + ' ' + str(key2)] = dic[key][key2] key_names = ['Brnchln','Eucl','Tort'] numbers = range(totalNumber) titlearray=[File] ordered_names = [] ordered_values = [] for i in range(totalNumber): for name in key_names: temp_name = name + ' ' + str(i) ordered_names.append(temp_name) ordered_values.append(dic2[temp_name]) #print(temp_name, dic2[temp_name]) array = np.array((ordered_names, ordered_values)) array.shape #%% clmntitle=array[0,:] mesur=array[1,:] newclmntitle=np.insert(clmntitle, 0, 'Filename') newmesur=np.insert(mesur,0,titlearray) array3=np.array((newclmntitle,newmesur)) array3 array3.shape #%% import xlsxwriter workbook = xlsxwriter.Workbook('test1 processed.xlsx') worksheet = workbook.add_worksheet() row=0 col=0 worksheet.write_row(row, col, newclmntitle) row=1 col=0 worksheet.write_row(row, col, newmesur) workbook.close()Any help is appreciated!!!
p.s. please go easy on me.. i know my coding sucks.. lol