![]() |
large csv to many xlsx containing multiple tabs - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: large csv to many xlsx containing multiple tabs (/thread-14225.html) |
large csv to many xlsx containing multiple tabs - thatIsTheCase - Nov-20-2018 I need to automate the creation of around 1000 excel workbooks. The workbook will contain three tabs. TAB 1 = Overview and instructions for fund holders. Will be the same for all xlsx. TAB 2 = Grant Transactions TAB 3 = Gift Transactions I have two large csv files, one for each type of transaction (Gift/Grant). Gist: Open (as pandas.dataFrame) a static csv file containing transaction history for around 1000 fund accounts; Save headers as variables to be used on all workbooks; Use groupby on csv['fund_id'] and apply the unique values to a list. Iterate through df using list values. for each: initiate/open workbook object as ('Fund_nm'_'Fund_ID'.csv), initiate/open worksheet object with name 'Grants', write headers, write row values, close (save) worksheet. I would like to write one excel workbook (.xlsx)for each fund account. This means I will either have to write all three worksheets at once (intro tab, grant tab, gifts tab) or I will have to update xlsx when creating sheets two and three. I have used xlsxWriter and pyexcel to write single workbooks. I am needing some help thinking through how to accomplish this task. Objective: create ~1000 excel workbooks containing intro tab (constant across workbooks), grants tab, and gifts tab. I currently have the data in two csv files. Any help thinking through the logical steps and/or recommending approaches to the task would be greatly appreciated. RE: large csv to many xlsx containing multiple tabs - thatIsTheCase - Nov-23-2018 df1 = pd.read_csv('Grant_Transactions.csv') fund_ids = list(df1['Account ID'].values) for fund_id in fund_ids: df2 = df1[df1['Account ID'] == fund_id] writer = pd.ExcelWriter(str(fund) + '.xlsx', engine='xlsxwriter') df2.to_excel(writer, 'Sheet1') workbook = writer.book worksheet = writer.sheets['Sheet1']I have managed to figure out how to sucessfully loop through to create multiple xlsx files. This code generates an excel file per unique ID. ISSUES: 1. This code takes a long time to generate the 1700+ xlsx files. Is there are better way? 2. Given 1, I am concerned about adding the second tab (gift contribution transactions). I need to figure out best way to do: for fund_id in df2, where fund_id is equal to fund_id in df4(df3 is the pd.read_csv of 2nd csv), write df2.to_excel in sheet 1 and write df4 to sheet 2. RE: large csv to many xlsx containing multiple tabs - thatIsTheCase - Nov-23-2018 df1 = pd.read_csv('Grant_Transactions.csv') fund_ids = list(df1['Account ID'].values) for fund_id in fund_ids: df2 = df1[df1['Account ID'] == fund_id] writer = pd.ExcelWriter(str(fund) + '.xlsx', engine='xlsxwriter') df2.to_excel(writer, 'Sheet1') workbook = writer.book worksheet = writer.sheets['Sheet1'] workbook.close()Added workbook.close() to end but still is very slow to write 1700 xlsx. RE: large csv to many xlsx containing multiple tabs - thatIsTheCase - Nov-27-2018 import pandas as pd df0 = pd.read_csv('test_sample.csv', usecols= ('typeCategory','tran_id', 'issue_d','disbursement_d','fund_id','fund_nm', 'amount','grantee_legal_nm')) df00 = pd.read_csv('test_sample.csv', usecols= ('typeCategory','tran_id', 'issue_d','fund_id','fund_nm', 'amount')) df1 = df0[df0['typeCategory'] == 'grant'] df2 = df00[df00['typeCategory'] == 'gift'] grantingFundID = list(df0['fund_id'].values) #cell_format = workbook.add_format() #df1.columns image_file = open('logo1.png', 'rb') image_data = io.BytesIO(image_file.read()) for fund_id in grantingFundID: df3= df1[df1['fund_id'] == fund_id] df4= df2[df2['fund_id'] == fund_id] writer = pd.ExcelWriter(str(fund_id)+'.xlsx', engine='xlsxwriter') df3.to_excel(writer, 'grants',index=False) df4.to_excel(writer, 'gifts',index=False) workbook = writer.book worksheet = writer.sheets['grants'] worksheet.set_margins(top=1.3) worksheet.set_header('&L&G', { 'image_left': 'logo.png', 'image_data_left': image_data }) worksheet.set_column(0,30,20) worksheet = writer.sheets['gifts'] workbook.close()Here is updated code. I am now adding header to xlsx (logo1.png). I have not added the overview tab. Posting here to save someone else the time in the future. |