Python Forum
How to loop through all excel files and sheets in folder
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to loop through all excel files and sheets in folder
#1
Using python 3. I need to loop through a folder that contains excel files and each file has multiple sheets. How do I loop through all the files and all the sheets and extract to a dataframe?

I need to include however the sheet name and filename.. Some excel files the tabs names are in the order 11.22 10.22 09.22 etc while other sheets are 09.22 10.22 11.22. Not all sheets have all the columns so i need only sheets that have columns projectid and status. Also the dates in paymentdate columns aren't always in the same format

What I was able to accomplish only returns one excel file and all the worksheets for that file but I need for all files. Please help.

This is what I have so far:


from xlsxwriter import Workbook
import pandas as pd
import openpyxl
import glob
import os

path = 'filestoimport/*.xlsx'

for filepath in glob.glob(path):
    xl = pd.ExcelFile(filepath)


    # Define an empty list to store individual DataFrames
    list_of_dfs = []
    list_of_dferror= []
    for sheet_name in xl.sheet_names:
        df = xl.parse(sheet_name, usecols='A,D,N,B,C,E,F,G,H,I,J,K,L,M', header=0)
        df.columns = df.columns.str.replace(' ', '')

        df['sheetname'] = sheet_name  # this adds `sheet_name` into the column
        
        # using basename function from os
        # module to print file name
        file_name = os.path.basename(filepath)
        df['sourcefilename'] = file_name
  
        # only add sheets containing columns ['Status', 'ProjectID']
        column_names = ['Status', 'ProjectID']
        if set(column_names).issubset(df.columns):
         df['Status'].fillna('', inplace=True)
         df['Addedby'].fillna('', inplace=True)
      # And append it to the list
         list_of_dfs.append(df)
      
    # Combine all DataFrames into one
    data = pd.concat(list_of_dfs, ignore_index=True)
Reply
#2
Are you sure this is where you want to define list_of_dfs? This resets the list each time you open a spreadsheet. When you are done reading all the files list_of_dfs contains the df's from the last file.
for filepath in glob.glob(path):
    xl = pd.ExcelFile(filepath)
 
    # Define an empty list to store individual DataFrames
    list_of_dfs = []
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  File loop curiously skipping files - FIXED mbk34 10 775 Feb-10-2024, 07:08 AM
Last Post: buran
  Copy Paste excel files based on the first letters of the file name Viento 2 406 Feb-07-2024, 12:24 PM
Last Post: Viento
  Compare folder A and subfolder B and display files that are in folder A but not in su Melcu54 3 522 Jan-05-2024, 05:16 PM
Last Post: Pedroski55
  Rename files in a folder named using windows explorer hitoxman 3 729 Aug-02-2023, 04:08 PM
Last Post: deanhystad
  Rename all files in a folder hitoxman 9 1,472 Jun-30-2023, 12:19 AM
Last Post: Pedroski55
Lightbulb Help using Google Sheets matheuspimenta 0 714 Dec-15-2022, 05:36 PM
Last Post: matheuspimenta
  Creating csv files from Excel file azizrasul 40 5,516 Nov-03-2022, 08:33 PM
Last Post: azizrasul
  python gzip all files from a folder mg24 3 3,953 Oct-28-2022, 03:59 PM
Last Post: mg24
  delete all files and subdirectory from a main folder mg24 7 1,574 Oct-28-2022, 07:55 AM
Last Post: ibreeden
  Merge all json files in folder after filtering deneme2 10 2,324 Sep-18-2022, 10:32 AM
Last Post: deneme2

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020