Python Forum
Export dataframe to xlsx - Error "zipfile.BadZipFile: File is not a zip file"
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Export dataframe to xlsx - Error "zipfile.BadZipFile: File is not a zip file"
#1
Hi
current configuration: python: 3.9 |openpyxl: 3.0.6 | pandas: 1.2.3 |xlrd 2.0.1

From a couple of years I use a really useful function to export pandas dataframe to xlsx with several customization, but now that I've updated python and packages as shown above, I receive the error "zipfile.BadZipFile: File is not a zip file".

Such error did not raise with the older configuration: python: 3.7 |openpyxl: 3.0.5 | pandas: 1.1.2 |xlrd 1.2.0

Could you help to make it working with the updated python and packages? (python: 3.9 |openpyxl: 3.0.6 | pandas: 1.2.3 |xlrd 2.0.1)

The function below can be run as it is, just replace "pathExport" to your export directory if you'd like to test it.
Thank you in adavance for the help

def append_df_to_excel(filename, df, sheet_name='Sheet1', startrow=None,
                           truncate_sheet=False, 
                           **to_excel_kwargs):
        """
        Append a DataFrame [df] to existing Excel file [filename]
        into [sheet_name] Sheet.
        If [filename] doesn't exist, then this function will create it.
    
        Parameters:
          filename : File path or existing ExcelWriter
                     (Example: '/path/to/file.xlsx')
          df : dataframe to save to workbook
          sheet_name : Name of sheet which will contain DataFrame.
                       (default: 'Sheet1')
          startrow : upper left cell row to dump data frame.
                     Per default (startrow=None) calculate the last row
                     in the existing DF and write to the next row...
          truncate_sheet : truncate (remove and recreate) [sheet_name]
                           before writing DataFrame to Excel file
          to_excel_kwargs : arguments which will be passed to `DataFrame.to_excel()`
                            [can be dictionary]
    
        Returns: None
    
        (c) [MaxU](https://stackoverflow.com/users/5741205/maxu?tab=profile)
        """
        from openpyxl import load_workbook
    
        # ignore [engine] parameter if it was passed
        if 'engine' in to_excel_kwargs:
            to_excel_kwargs.pop('engine')
    
        writer = pd.ExcelWriter(filename, engine='openpyxl')
    
        # Python 2.x: define [FileNotFoundError] exception if it doesn't exist 
        try:
            FileNotFoundError
        except NameError:
            FileNotFoundError = IOError
    
    
        try:
            # try to open an existing workbook
            writer.book = load_workbook(filename)
            
            # get the last row in the existing Excel sheet
            # if it was not specified explicitly
            if startrow is None and sheet_name in writer.book.sheetnames:
                startrow = writer.book[sheet_name].max_row
    
            # truncate sheet
            if truncate_sheet and sheet_name in writer.book.sheetnames:
                # index of [sheet_name] sheet
                idx = writer.book.sheetnames.index(sheet_name)
                # remove [sheet_name]
                writer.book.remove(writer.book.worksheets[idx])
                # create an empty sheet [sheet_name] using old index
                writer.book.create_sheet(sheet_name, idx)
            
            # copy existing sheets
            writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
        except FileNotFoundError:
            # file does not exist yet, we will create it
            pass
    
        if startrow is None:
            startrow = 0
    
        # write out the new sheet
        df.to_excel(writer, sheet_name, startrow=startrow, **to_excel_kwargs)
    
        # save the workbook
        writer.save()

pathExport = r"F:\PYTHON\NB-Suite_python39\MNE\outputData\df.xlsx"
df1 = pd.DataFrame({'numbers': [1, 2, 3],
                    'colors': ['red', 'white', 'blue'],
                    'colorsTwo': ['yellow', 'white', 'blue']
                    })

append_df_to_excel(pathExport, df1, sheet_name="DF1", index=False, startcol=0, startrow=0)
Reply
#2
could you please, post the full traceback
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#3
Sure

Traceback (most recent call last):
  File "<input>", line 83, in <module>
  File "<input>", line 46, in append_df_to_excel
  File "C:\Users\20180043\AppData\Local\Programs\Python\Python39\lib\site-packages\openpyxl\reader\excel.py", line 315, in load_workbook
    reader = ExcelReader(filename, read_only, keep_vba,
  File "C:\Users\20180043\AppData\Local\Programs\Python\Python39\lib\site-packages\openpyxl\reader\excel.py", line 124, in __init__
    self.archive = _validate_archive(fn)
  File "C:\Users\20180043\AppData\Local\Programs\Python\Python39\lib\site-packages\openpyxl\reader\excel.py", line 96, in _validate_archive
    archive = ZipFile(filename, 'r')
  File "C:\Users\20180043\AppData\Local\Programs\Python\Python39\lib\zipfile.py", line 1257, in __init__
    self._RealGetContents()
  File "C:\Users\20180043\AppData\Local\Programs\Python\Python39\lib\zipfile.py", line 1324, in _RealGetContents
    raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file
Reply
#4
Looking at python source - zipfile.py and openpyxl reader/excel.py - the lines referred in the traceback, they were not changed for a long time. So I guess there is something else with the file you work with.
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#5
that code is working perfectly with: python: 3.7 |openpyxl: 3.0.5 | pandas: 1.1.2 |xlrd 1.2.0
The file I'm working with was created directly in line (I did not load previously a dataframe from excel).
Some idea?
Reply
#6
(Mar-09-2021, 03:26 PM)Baggio Wrote: Sure

Traceback (most recent call last):
  File "<input>", line 83, in <module>
  File "<input>", line 46, in append_df_to_excel
  File "C:\Users\20180043\AppData\Local\Programs\Python\Python39\lib\site-packages\openpyxl\reader\excel.py", line 315, in load_workbook
    reader = ExcelReader(filename, read_only, keep_vba,
  File "C:\Users\20180043\AppData\Local\Programs\Python\Python39\lib\site-packages\openpyxl\reader\excel.py", line 124, in __init__
    self.archive = _validate_archive(fn)
  File "C:\Users\20180043\AppData\Local\Programs\Python\Python39\lib\site-packages\openpyxl\reader\excel.py", line 96, in _validate_archive
    archive = ZipFile(filename, 'r')
  File "C:\Users\20180043\AppData\Local\Programs\Python\Python39\lib\zipfile.py", line 1257, in __init__
    self._RealGetContents()
  File "C:\Users\20180043\AppData\Local\Programs\Python\Python39\lib\zipfile.py", line 1324, in _RealGetContents
    raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file

On stackoverflow A user said:

"I think it's because Pandas overwriting the existing workbook with an empty file. Just use a different name for the file after processing. Instead of just filename use source_filename with openpyxl and target_filename with Pandas. Note, the two must really be different."

Could you help to update the function as this user suggested? Thank you in advance
Reply
#7
I don't see how this could be related to writing part, because the error is raised at the time when you try to load the file you want to append to, before anything else.
Even in the unlikely case the user on SO is right that [in general] pandas overwriting existing file with a empty one may produce an error somehow, this is not your case - you load a xlsx workbook, append a sheet with data and try overwrite the file you read with the updated file. You don't overwrite with empty file.

Did you try with different file? Is there something different (apart from updated python/packages) compared to when it was working?
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#8
OK, I was able to replicate the problem. It is pandas related. Everything works just fine up to pandas 1.1.5
In pandas 1.2.0 they did some changes https://pandas.pydata.org/docs/whatsnew/v1.2.0.html
at the time when you instantiate pd.ExcelWriter with

writer = pd.ExcelWriter(filename, engine='openpyxl')`
it creates empty file 0 bytes and overwrites the existing file and then you get error when try to load it. It is not openpyxl related, because with latest version of openpyxl it works file with pandas 1.1.5

A workaround solution is to downgrade pandas to 1.1.5.
Consider raising an issue with pandas dev team https://github.com/pandas-dev/pandas/
Out of curiosity I will look further into what changes they did in 1.2.0
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#9
OK. Here is the solution - specify mode to be 'a', append
writer = pd.ExcelWriter(filename, engine='openpyxl', mode='a')
Alternatively, look at this one or this one where it loads the file before instantiating the pd.ExcelWriter.

I also post an answer on SO. I will appreciate if you upvote/accept the answer there
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#10
(Mar-11-2021, 04:00 PM)buran Wrote: OK. Here is the solution - specify mode to be 'a', append
writer = pd.ExcelWriter(filename, engine='openpyxl', mode='a')
Alternatively, look at this one or this one where it loads the file before instantiating the pd.ExcelWriter.

I also post an answer on SO. I will appreciate if you upvote/accept the answer there

thanks buran I answered you on my post on stack overflow. If you check in the comments of the new function I added also your name as contributor.

Contributions to the current helper function: https://stackoverflow.com/users/4046632/...ab=profile


I upvoted you, but since my reputation is less than 15 it doesn't display. If you can vote my as well maybe I'll reach 15 and my upvote on you will turn visible.


thanks for the support
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Errors if an xlsx file has blank rows in the beginning…. tester_V 1 825 Aug-28-2023, 06:22 PM
Last Post: deanhystad
  Creating a Dataframe from Zenodo zip file with multiple CSVs about Spotify man0s 0 1,352 Apr-26-2022, 01:45 PM
Last Post: man0s
  export dataframe to file.txt dramauh 5 1,925 Apr-21-2022, 01:23 AM
Last Post: sarahroxon7
  How to form a dataframe reading separate dictionaries from .txt file? Doug 1 4,238 Nov-09-2020, 09:24 AM
Last Post: PsyPy
  strange error from pandas dataframe djf123 1 4,033 Jul-27-2020, 05:25 AM
Last Post: scidam
  Filter data based on a value from another dataframe column and create a file using lo pawanmtm 1 4,279 Jul-15-2020, 06:20 PM
Last Post: pawanmtm
  Fatal Python error: initfsencoding: unable to load the file system codec gauravbhardwajee 12 28,178 Apr-30-2020, 07:45 PM
Last Post: barrpath
  error bars with dataframe and pandas Hucky 4 4,219 Apr-27-2020, 02:02 AM
Last Post: Hucky
  error : value of a DataFrame is ambiguous Sonata 1 2,243 Apr-24-2020, 05:40 PM
Last Post: anbu23
  XLSX file with multiple sheets to josn file ovidius 2 2,225 Apr-05-2020, 09:22 AM
Last Post: ovidius

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020