Python Forum

Full Version: Merging spreadsheets with the same columns and extracting rows with matching entries
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi All,

I have just begun learning Python for the last couple of months and I have got to grips with the very basics I think having no background in coding.

I want to build a script that:
1) Identifies the latest created folder in one of my document folders
2) Takes all the spreadsheets in the latest created folder and merges them into one single spreadsheet (they all have the same column headings) (previously I was using the excel "get data" function myself).
3) Extracts relevant rows from the merged spreadsheet by filtering for a list of URLs that I have compiled and outputting all these. i.e. output any rows that contain specific URLs

I do all this manually at the moment and I'm thinking the 30 mins I save doing this everyday could be put to a lot better use!

If anyone has any pointers/tips for getting started/ what I should be looking at to do this I would really appreciate it, thank you!

John B
look for packages that you can use in your project, search here: https://pypi.org/
example openpyxl for reading and writing excel files:
search all openpyxl:
https://pypi.org/search/?q=openpyxl&o=
specific: https://pypi.org/project/openpyxl/
and github code docs: https://openpyxl.readthedocs.io/en/stable/
(Aug-17-2021, 10:14 PM)Larz60+ Wrote: [ -> ]look for packages that you can use in your project, search here: https://pypi.org/
example openpyxl for reading and writing excel files:
search all openpyxl:
https://pypi.org/search/?q=openpyxl&o=
specific: https://pypi.org/project/openpyxl/
and github code docs: https://openpyxl.readthedocs.io/en/stable/

Thanks for the heads up that's really helpful! Appreciate it so much!

John
import os
import pandas as pd 
cwd = os.path.abspath('K:\\ExternalLinkValidationReports\\UK\\2020-01-06')
files = os.listdir(cwd)
df = pd.DataFrame()
for file in files:
     if file.endswith('.xlsx'):
         df = df.append(pd.read_excel(file), ignore_index=True) 
df.head()
df.to_excel('Appended Reports.xlsx')
So far I've managed to create an appended report that collects and appends all the Excel files in K:\\ExternalLinkValidationReports\\UK\\2020-01-06

However, I want to modify this so that it looks for the latest modified folder in K:\\ExternalLinkValidationReports\\UK\\
rather than having to specify the exact folder each time.

I wonder if anyone knows a quick way to modify my code so that it can do this?

Many thanks!

John