Python Forum
large csv to many xlsx containing multiple tabs
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
large csv to many xlsx containing multiple tabs
#1
I need to automate the creation of around 1000 excel workbooks. The workbook will contain three tabs.

TAB 1 = Overview and instructions for fund holders. Will be the same for all xlsx.
TAB 2 = Grant Transactions
TAB 3 = Gift Transactions

I have two large csv files, one for each type of transaction (Gift/Grant).

Gist:

Open (as pandas.dataFrame) a static csv file containing transaction history for around 1000 fund accounts;
Save headers as variables to be used on all workbooks;
Use groupby on csv['fund_id'] and apply the unique values to a list.
Iterate through df using list values.
for each: initiate/open workbook object as ('Fund_nm'_'Fund_ID'.csv), initiate/open worksheet object with name 'Grants', write headers, write row values, close (save) worksheet.

I would like to write one excel workbook (.xlsx)for each fund account.
This means I will either have to write all three worksheets at once (intro tab, grant tab, gifts tab) or I will have to update xlsx when creating sheets two and three.

I have used xlsxWriter and pyexcel to write single workbooks. I am needing some help thinking through how to accomplish this task.

Objective: create ~1000 excel workbooks containing intro tab (constant across workbooks), grants tab, and gifts tab. I currently have the data in two csv files.

Any help thinking through the logical steps and/or recommending approaches to the task would be greatly appreciated.
Reply
#2
df1 = pd.read_csv('Grant_Transactions.csv')
fund_ids = list(df1['Account ID'].values)

for fund_id in fund_ids:
    df2  = df1[df1['Account ID'] == fund_id]

    writer = pd.ExcelWriter(str(fund) + '.xlsx', engine='xlsxwriter')
    df2.to_excel(writer, 'Sheet1')
    workbook = writer.book
    worksheet = writer.sheets['Sheet1']
I have managed to figure out how to sucessfully loop through to create multiple xlsx files. This code generates an excel file per unique ID.

ISSUES:
1. This code takes a long time to generate the 1700+ xlsx files. Is there are better way?
2. Given 1, I am concerned about adding the second tab (gift contribution transactions).

I need to figure out best way to do:
for fund_id in df2, where fund_id is equal to fund_id in df4(df3 is the pd.read_csv of 2nd csv), write df2.to_excel in sheet 1 and write df4 to sheet 2.
Reply
#3
df1 = pd.read_csv('Grant_Transactions.csv')
fund_ids = list(df1['Account ID'].values)
 
for fund_id in fund_ids:
    df2  = df1[df1['Account ID'] == fund_id]
 
    writer = pd.ExcelWriter(str(fund) + '.xlsx', engine='xlsxwriter')
    df2.to_excel(writer, 'Sheet1')
    workbook = writer.book
    worksheet = writer.sheets['Sheet1']

    workbook.close()
Added workbook.close() to end but still is very slow to write 1700 xlsx.
Reply
#4
import pandas as pd

df0 = pd.read_csv('test_sample.csv', usecols=
                    ('typeCategory','tran_id',
                    'issue_d','disbursement_d','fund_id','fund_nm', 'amount','grantee_legal_nm'))
df00 = pd.read_csv('test_sample.csv', usecols=
                    ('typeCategory','tran_id',
                    'issue_d','fund_id','fund_nm', 'amount'))

df1 = df0[df0['typeCategory'] == 'grant']
df2 = df00[df00['typeCategory'] == 'gift']


grantingFundID = list(df0['fund_id'].values)
#cell_format = workbook.add_format()

#df1.columns

image_file = open('logo1.png', 'rb')
image_data = io.BytesIO(image_file.read())

for fund_id in grantingFundID:
    df3= df1[df1['fund_id'] == fund_id]
    df4= df2[df2['fund_id'] == fund_id]

    writer =  pd.ExcelWriter(str(fund_id)+'.xlsx', engine='xlsxwriter')
    df3.to_excel(writer, 'grants',index=False)
    df4.to_excel(writer, 'gifts',index=False)

    workbook = writer.book
    worksheet = writer.sheets['grants']
    worksheet.set_margins(top=1.3)
    worksheet.set_header('&L&G', {
                                  'image_left': 'logo.png',
                                  'image_data_left': image_data
                                  })
    worksheet.set_column(0,30,20)
    worksheet = writer.sheets['gifts']
    workbook.close()
Here is updated code. I am now adding header to xlsx (logo1.png). I have not added the overview tab. Posting here to save someone else the time in the future.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  extracting sublist from a large multiple molecular file juliocollm 2 2,293 May-25-2020, 12:49 PM
Last Post: juliocollm
  XLSX file with multiple sheets to josn file ovidius 2 2,230 Apr-05-2020, 09:22 AM
Last Post: ovidius
  read multiple .xlsx files and text files in a directory BNB 11 25,683 Jun-07-2017, 07:42 AM
Last Post: BNB
  Write data into existing Excel (xlsx) file with multiple sheets BNB 1 15,341 Jun-01-2017, 04:22 PM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020