Works In Pycharm, Tons of problems after EXE

chob_thomas · Dec-29-2022, 04:12 AM

Im using two .py files to accomplish the task of converting pdfs to tables then saving them into an excel sheet.
Everything works as it should when running on pycharm, but when I use pyinstaller to create the .exe, I get get error after error, some I could fix by installing the missing module in the CMD terminal (1st error example) I basically had to do it for every module, so I assume this is some rooky mistake.

Error:C:\Users\Justi\PycharmProjects\Camelot>C:\Users\Justi\PycharmProjects\Camelot\dist\TheWorks\TheWorks.exe
Folder Path: C:/Project/Part Lists
Output File: C:/Project/Part Lists/buffbilly.xlsx
Starting Extraction and Clean Up
Traceback (most recent call last):
  File "TheWorks.py", line 1, in <module>
    from TidyUp import extract_and_format_tables
  File "PyInstaller\loader\pyimod02_importers.py", line 499, in exec_module
  File "TidyUp.py", line 108, in <module>
    extract_and_format_tables(folder_path, output_file)
  File "TidyUp.py", line 49, in extract_and_format_tables
    excel_writer = pd.ExcelWriter(output_file, engine='xlsxwriter')
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "pandas\io\excel\_xlsxwriter.py", line 198, in __init__
ModuleNotFoundError: No module named 'xlsxwriter'
[24524] Failed to execute script 'TheWorks' due to unhandled exception!

this one I can't seem to fix; the module is installed everywhere and the suggestion is I use a different module - I don't really want to blow up my code and change to a different module though....

Error:C:\Users\Justi\PycharmProjects\Camelot>C:\Users\Justi\PycharmProjects\Camelot\dist\TheWorks\TheWorks.exe
Folder Path: C:/Project/Part Lists
Output File: C:/Project/Part Lists/combined_tables.xlsx
Starting Extraction and Clean Up
  0%|                                                                                            | 0/9 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "TheWorks.py", line 1, in <module>
    from TidyUp import extract_and_format_tables
  File "<frozen importlib._bootstrap>", line 1178, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1149, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
  File "PyInstaller\loader\pyimod02_importers.py", line 499, in exec_module
  File "TidyUp.py", line 108, in <module>
    extract_and_format_tables(folder_path, output_file)
  File "TidyUp.py", line 56, in extract_and_format_tables
    tables = camelot.read_pdf(os.path.join(folder_path, pdf_file), pages='all', flavor='stream')
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "camelot\io.py", line 111, in read_pdf
  File "camelot\handlers.py", line 50, in __init__
  File "camelot\handlers.py", line 75, in _get_pages
  File "PyPDF2\_reader.py", line 1974, in __init__
  File "PyPDF2\_utils.py", line 369, in deprecation_with_replacement
  File "PyPDF2\_utils.py", line 351, in deprecation
PyPDF2.errors.DeprecationError: PdfFileReader is deprecated and was removed in PyPDF2 3.0.0. Use PdfReader instead.
[15280] Failed to execute script 'TheWorks' due to unhandled exception!

The code for TidyUp.py

import camelot
import pandas as pd
import os
from tqdm import tqdm
from PyPDF2 import PdfReader
import warnings

import PySimpleGUI as sg

# Create a layout with a folder selection button, a text input field, and a "Save As" button
layout = [
    [sg.Text("Folder Path"), sg.InputText(), sg.FolderBrowse()],
    [sg.Text("Output File"), sg.InputText(), sg.SaveAs(file_types=(("Excel Files", "*.xlsx"),))],
    [sg.Submit()]
]

# Create the window
window = sg.Window("Input Folder and Output File", layout)

# Loop to get the user input
while True:
    event, values = window.read()
    if event == sg.WIN_CLOSED or event == "Submit":
        break

# Get the user input
folder_path = values[0]
output_file = values[1]

# Do something with the input
print(f"Folder Path: {folder_path}")
print(f"Output File: {output_file}")

# Close the window
window.close()




# suppress the FutureWarning
warnings.simplefilter("ignore", category=FutureWarning)

# suppress the UserWarning
warnings.simplefilter("ignore", category=UserWarning)

def extract_and_format_tables(folder_path, output_file):
    print("Starting Extraction and Clean Up")
    # Create an ExcelWriter object to write the data to
    excel_writer = pd.ExcelWriter(output_file, engine='xlsxwriter')

    # Iterate over the PDF files in the folder
    for pdf_file in tqdm(os.listdir(folder_path)):
        # Check if the file is a PDF
        if pdf_file.endswith('.pdf'):
            # Use Camelot to extract the tables from the PDF
            tables = camelot.read_pdf(os.path.join(folder_path, pdf_file), pages='all', flavor='stream')

            # Convert the tables to a list of pandas dataframes
            dfs = [table.df for table in tables]

            # Combine all of the dataframes into a single dataframe
            df_all = pd.concat(dfs)

            # Convert the values in column 4 to numeric values, and fill any NaN values with an empty string
            df_all[4] = pd.to_numeric(df_all[4], errors='coerce')

            # Define a function to replace cells containing "JOB" with an empty string
            def replace_qty_job(cell):
                if "Job" in str(cell) or "Description" in str(cell) or "Consolidation" in str(cell) or "Width" in str(cell):
                    return None
                else:
                    return cell

            # Apply the function to the dataframe using applymap
            df_all = df_all.applymap(lambda x: replace_qty_job(x))

            def replace_empty_strings(cell):
                if cell == '':
                    return None
                else:
                    return cell

            # Apply the function to the dataframe using applymap
            df_all = df_all.applymap(lambda x: replace_empty_strings(x))

            # Drop rows that have all null values in columns 0, 1, 2, 3, 4
            df_all.dropna(how='all', subset=[0, 1, 2, 3, 4], inplace=True)

            # Set the width of the first four columns to 25 characters
            df_all.style.set_properties(**{'text-align': 'left'}).set_table_styles(
                [{'selector': 'th', 'props': [('text-align', 'left'), ('max-width', '50ch')]},
                 {'selector': 'td', 'props': [('text-align', 'left'), ('max-width', '50ch')]}])

            df_all.to_excel(excel_writer, sheet_name=pdf_file, index=False, header=True)

    # Save the Excel file and close the ExcelWriter object
    excel_writer.save()
    excel_writer.close()
    print('Finished Extraction and Clean UP, Moving on to AutoFormatting')

# Set the file path to the folder containing the PDF files
#folder_path = 'C:/Project/Part Lists'

# Set the file path for the output Excel file
#output_file = 'C:/Project/combined_tables.xlsx'

# Extract and format the tables from the PDF files in the specified folder, and write the results to the output Excel file
extract_and_format_tables(folder_path, output_file)

the code for TheWorks.py wich runs it all

from TidyUp import extract_and_format_tables
import openpyxl
import warnings
from TidyUp import output_file
import openpyxl

################################################################
# suppress the FutureWarning
warnings.simplefilter("ignore", category=FutureWarning)
# suppress the UserWarning
warnings.simplefilter("ignore", category=UserWarning)
#################################################################

#name the folder to find the PDF files in,
#name the Excel sheet to output the converted PDFs to
#folder_path = "c:/Project/Part Lists"
#output_file = 'C:/Project/combined_tables.xlsx'

# Open the workbook
wb = openpyxl.load_workbook(output_file)

# Iterate over the sheets in the workbook
for sheet in wb:
    # Iterate over the rows in the sheet
    for row in sheet.iter_rows():
        # Get the value in the first cell of the row
        cell_value = row[0].value
        # Check if the cell value contains "PSI" or "Total Parts Per"
        if "PSI" in str(cell_value) or "Total Parts Per" in str(cell_value):
            # If the cell value contains "PSI" or "Total Parts Per", apply formatting to the row or cell
            if "PSI" in str(cell_value):
                # If the cell value contains "PSI", apply formatting to the entire row
                for cell in row:
                    cell.fill = openpyxl.styles.PatternFill(start_color='FFFF00', end_color='FFFF00', fill_type='solid')
            else:
                # If the cell value contains "Total Parts Per", apply formatting to the cell
                row[0].font = openpyxl.styles.Font(bold=True, size=16)

    # Adjust the width of column 0 to fit the data
    max_length = 0
    for cell in sheet['A']:
        if cell.value:
            if len(str(cell.value)) > max_length:
                max_length = len(str(cell.value))
    adjusted_width = (max_length + 2) * 1
    sheet.column_dimensions['A'].width = adjusted_width

# Save the workbook
wb.save(output_file)
print('All done, have a nice rest of your day! :)')

I'm almost there, I just cant figure this last problem out.
Runs fine in Pycharm but as soon as I run Pyinstaller on it and run the EXE it breaks.
Advice please!
I hope its the last problem anyway....... Huh

chob_thomas · Dec-29-2022, 06:27 AM

So i went ahead and created new empty files in a new project and copied just the code over into the empty files.
then I one by one re-installed all the libraries that were required until the program ran properly again (in python).
then I used
pyinstaller --onefile Werks.py TidyUp.py
in the pycharm terminal and it looked like it ran properly and didn't throw any apparent errors.
but when I run the resulting EXE I get this error code now:

Error:C:\Users\Justi>C:\Users\Justi\PycharmProjects\pythonProject4\dist\Werks.exe
Folder Path: C:/Project/Part Lists
Output File: C:/Project/Part Lists/combined_tables.xlsx
Starting Extraction and Clean Up
  0%|                                                                                            | 0/9 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "Werks.py", line 2, in <module>
  File "<frozen importlib._bootstrap>", line 1178, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1149, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
  File "PyInstaller\loader\pyimod02_importers.py", line 499, in exec_module
  File "TidyUp.py", line 108, in <module>
  File "TidyUp.py", line 56, in extract_and_format_tables
  File "camelot\io.py", line 111, in read_pdf
  File "camelot\handlers.py", line 50, in __init__
  File "camelot\handlers.py", line 75, in _get_pages
  File "PyPDF2\_reader.py", line 1974, in __init__
  File "PyPDF2\_utils.py", line 369, in deprecation_with_replacement
  File "PyPDF2\_utils.py", line 351, in deprecation
PyPDF2.errors.DeprecationError: PdfFileReader is deprecated and was removed in PyPDF2 3.0.0. Use PdfReader instead.
[8404] Failed to execute script 'Werks' due to unhandled exception!

chob_thomas · Dec-29-2022, 04:37 PM

solved it here:

https://github.com/camelot-dev/camelot/issues/339

***snippsat*** · (This post was last modified: Dec-29-2022, 04:51 PM by snippsat.)

The problem is in camelot downgrade as descriped in issus.
That it work in Pycharm and give the error Pyinstaller can be version of camelot when run in Pycharm it can use version 2.0.
I use always virtual enviroment when troubleshoot Pyinstaller problems.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Pandas - error when running Pycharm, but works on cmd line	zxcv101	1	1,375	Jun-18-2022, 01:09 PM Last Post: snippsat
	Cant run command python in PyCharm, but it works in cmd	LevG0r	1	2,823	Sep-09-2018, 06:21 PM Last Post: Axel_Erfurt

Works In Pycharm, Tons of problems after EXE

User Panel Messages

Announcements