Oct-11-2019, 08:47 AM
Good morning my dear Python Experts,
I'm new here, I'm not an expert with coding or python and i have an issue with openpyxl.
I have some process that produce xls files with data in sheets within a particular format.
I need to use python to transform these data into another format for future uses
Here a part of my code :
The source file can contain between 1k to 70-80k lines, the code works fine when it process 1k-5k lines (process time to "debug point 1" around 5sec) but gone wild with larger files.
The process time to "debug point 1" is around 1min for 50k line(it's fine, no problem) but takes around 1 hours to "debug point 2" when it goes.
Most of the time i got the error :
- Optimised Modes for openpyxl : need to use 2 files for the processing instead of simply add a new sheet in the origin file and it's slower than without optimised file
- Use of generator to process the original data
- I try the scirpt on differents machines, same problem
As the time to "append" the corrected data is not an issue for me(1 min of processing for big files is good), I think that the issue is with the "saving" process. I didn't find relevant solution by myself, so that why I'm here.
Do you have an idea of the direction I should search ?
Have a nice day :)
I'm new here, I'm not an expert with coding or python and i have an issue with openpyxl.
I have some process that produce xls files with data in sheets within a particular format.
I need to use python to transform these data into another format for future uses
Here a part of my code :
import openpyxl as opxl from openpyxl import Workbook WorkbookSource = opxl.load_workbook(OriginFile, read_only=True) SheetSource = WorkbookSource['data'] row_count = SheetSource.max_row WorkbookDestination = opxl.Workbook(write_only = True) destination_sheet=WorkbookDestination.create_sheet(title="data_fancy") for x in range(1,row_count): DataXls = SheetSource.cell(row=x, column=1).value # #Things to make the world an happy place # destination_sheet.append(Data_fancy) print("debug point 1") WorkbookSource.close() WorkbookDestination.save(filename = DestisnationFile) print("debug point 2")And now my problem :
The source file can contain between 1k to 70-80k lines, the code works fine when it process 1k-5k lines (process time to "debug point 1" around 5sec) but gone wild with larger files.
The process time to "debug point 1" is around 1min for 50k line(it's fine, no problem) but takes around 1 hours to "debug point 2" when it goes.
Most of the time i got the error :
Error:Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "C:\Python\lib\site-packages\openpyxl\worksheet\_writer.py", line 33, in _openpyxl_shutdown
os.remove(path)
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\.....\\AppData\\Local\\Temp\\openpyxl.49ujk2e1'
I tried different thing that i find on the internet like :- Optimised Modes for openpyxl : need to use 2 files for the processing instead of simply add a new sheet in the origin file and it's slower than without optimised file
- Use of generator to process the original data
- I try the scirpt on differents machines, same problem
As the time to "append" the corrected data is not an issue for me(1 min of processing for big files is good), I think that the issue is with the "saving" process. I didn't find relevant solution by myself, so that why I'm here.
Do you have an idea of the direction I should search ?
Have a nice day :)