Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Appending data
#1
How can I insert a header in a new MS Excel file and then append data, into the same MS Excel file after the last populated row, from another MS Excel file? The number of columns will be the same.
Reply
#2
Managed to get this code to work (although I get the error D:\Python\Test.py:8: FutureWarning: save is not part of the public API, usage can give in unexpected results and will be removed in a future version file_name.save()) which inserts the header to a new Excel file. However, I don't know how to append data from a file called Book1.xlsx and append it to Company Financials.xlsx after the header.

import pandas as pd
from pandas import ExcelWriter

header = ['Country', 'Product', 'Units Sold', 'Revenue', 'Cost', 'Revenue', 'Cost', 'Profit', 'Date', 'Month Number']
df = pd.DataFrame(columns = header)
file_name = ExcelWriter('D:\Python/Company Financials.xlsx')
df.to_excel(file_name,'Sheet1',index=0)     
file_name.save()
Reply
#3
Just using openpyxl, you can do that like this:

targetFile is a new Excel. Each time you want to write a new row, get targetFile[tsheet].max_row, that is the last row with data.

Add 1 and that is the row you want to write to: tmaxRow = targetFile[tsheet].max_row + 1

This little function gets a whole row from XLfile and writes that row to targetFile

# get all columns of 1 row which meets the condition
def getRow(XLfile, targetFile, sheet, tsheet, rowNum, maxCol):
    tmaxRow = targetFile[tsheet].max_row + 1
    print('target file max row is', tmaxRow)
    for colNum in range(1, maxCol + 1):
        sv = XLfile[sheet].cell(row=rowNum, column=colNum).value               
        targetFile[tsheet].cell(row=tmaxRow, column=colNum).value=sv
Reply
#4
Tried the following code: -

import openpyxl
from openpyxl import Workbook

# get all columns of 1 row which meets the condition
def getRow(XLfile, targetFile, sheet, tsheet, rowNum, maxCol):
    tmaxRow = targetFile[tsheet].max_row + 1
    print('target file max row is', tmaxRow)
    for colNum in range(1, maxCol + 1):
        sv = XLfile[sheet].cell(row=rowNum, column=colNum).value
        targetFile[tsheet].cell(row=tmaxRow, column=colNum).value=sv

XLfile="delete_empty_rows.xlsx"
targetFile="test.xlsx"
sheet="daily sales"
tsheet="person"
rowNum=1
maxCol=9

getRow(XLfile, targetFile, sheet, tsheet, rowNum, maxCol)
and got this error: -

"D:\Power BI & Python\venv\Scripts\python.exe" "D:/Power BI & Python/Test.py" 
Traceback (most recent call last):
  File "D:\Power BI & Python\Test.py", line 16, in <module>
    getRow(XLfile, targetFile, sheet, tsheet, rowNum, maxCol)
  File "D:\Power BI & Python\Test.py", line 3, in getRow
    tmaxRow = targetFile[tsheet].max_row + 1
TypeError: string indices must be integers

Process finished with exit code 1
Reply
#5
Can't see the problem.

Copy and paste each step one at a time, then you'll find it.

This works for me in bash. It copies the header row from my source file to row 2 of the target file.

Hope it works this time!

#! /usr/bin/python3
import openpyxl
from pathlib import Path

# show the XL files
def showFiles(path2XL):
    mydir = Path(path2XL)
    filelist = [filename for filename in mydir.iterdir() if filename.is_file()]
    for filename in filelist:
        print(f"\nfilename: {filename.name}") 
        
# open the source file you want
def openXL(path2XL):
    print('copy and paste the name of the file you want to work on here ... ')
    file = input('copy and paste the path to the Excel you want to work on here ... ')
    # data_only = True flag will get the values of formulas, not the formula
    # ws['B2'].data_type tells you if the cell contains a formula. 'f' indicates a formula
    sourceFile = openpyxl.load_workbook(path2XL + file, data_only = True)
    sheets = sourceFile.sheetnames
    for sheet in sheets:
        print('\nThe sheet names are:', sheet)
    return (sourceFile, file)

# get the sheet you want to extract data from
def getSheet():
    print('Which sheets do you want to get data from?')
    mysheet = input('Copy and paste the sheet name you want from above ... ')
    return mysheet

# make a target file
def makeXL(targetname, tsheet, path2XL):
    targetFile = openpyxl.Workbook()
    tsheet = targetFile.active
    tsheet.title = 'person'
    tsheets = targetFile.sheetnames
    tsheet = tsheets[0]    
    targetFile.save(path2XL + savename)
    print(f'target file made and  saved to', path2XL + targetname)
    return targetFile

# get all columns of 1 row which meets the condition
def getRow(XLfile, targetFile, sheet, tsheet, rowNum, maxCol, path2XL, savename):
    tmaxRow = targetFile[tsheet].max_row + 1
    print('target file max row + 1 is', tmaxRow)
    for colNum in range(1, maxCol + 1):
        sv = XLfile[sheet].cell(row=rowNum, column=colNum).value
        targetFile[tsheet].cell(row=tmaxRow, column=colNum).value=sv
    targetFile.save(path2XL + savename)
    print(f'target file saved to', path2XL + savename)
          
if __name__ == "__main__":
    print('Where are the XL files? For example: /home/pedro/myPython/openpyxl/xlsx_files/ ?')
    # path2XL = '/home/pedro/myPython/openpyxl/xlsx_files/'
    path2files = input('Where are the XL files? Enter the path to the XL files ... ')
    showFiles(path2files)
    data = openXL(path2files)
    sF = data[0]
    myfile = data[1]
    sheet = getSheet()
    savename = "test_me1.xlsx"    
    tsheet="person"
    tF = makeXL(savename, tsheet, path2files)
    rowNum=1 # the headers
    maxCol=9 
    getRow(sF, tF, sheet, tsheet, rowNum, maxCol, path2files, savename)
    print('All done!')
Reply
#6
Bash?
I am using PyCharm. Will try the above code and come back if I have any issues.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Appending a row of data in an MS Excel file azizrasul 3 1,188 Nov-06-2022, 05:17 PM
Last Post: azizrasul
  Appending data into a file in tab delimited format metro17 1 4,148 Aug-06-2019, 07:34 AM
Last Post: fishhook

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020