Python Forum
Merging spreadsheets with the same columns and extracting rows with matching entries
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Merging spreadsheets with the same columns and extracting rows with matching entries
#1
Hi All,

I have just begun learning Python for the last couple of months and I have got to grips with the very basics I think having no background in coding.

I want to build a script that:
1) Identifies the latest created folder in one of my document folders
2) Takes all the spreadsheets in the latest created folder and merges them into one single spreadsheet (they all have the same column headings) (previously I was using the excel "get data" function myself).
3) Extracts relevant rows from the merged spreadsheet by filtering for a list of URLs that I have compiled and outputting all these. i.e. output any rows that contain specific URLs

I do all this manually at the moment and I'm thinking the 30 mins I save doing this everyday could be put to a lot better use!

If anyone has any pointers/tips for getting started/ what I should be looking at to do this I would really appreciate it, thank you!

John B
Reply
#2
look for packages that you can use in your project, search here: https://pypi.org/
example openpyxl for reading and writing excel files:
search all openpyxl:
https://pypi.org/search/?q=openpyxl&o=
specific: https://pypi.org/project/openpyxl/
and github code docs: https://openpyxl.readthedocs.io/en/stable/
Reply
#3
(Aug-17-2021, 10:14 PM)Larz60+ Wrote: look for packages that you can use in your project, search here: https://pypi.org/
example openpyxl for reading and writing excel files:
search all openpyxl:
https://pypi.org/search/?q=openpyxl&o=
specific: https://pypi.org/project/openpyxl/
and github code docs: https://openpyxl.readthedocs.io/en/stable/

Thanks for the heads up that's really helpful! Appreciate it so much!

John
Reply
#4
import os
import pandas as pd 
cwd = os.path.abspath('K:\\ExternalLinkValidationReports\\UK\\2020-01-06')
files = os.listdir(cwd)
df = pd.DataFrame()
for file in files:
     if file.endswith('.xlsx'):
         df = df.append(pd.read_excel(file), ignore_index=True) 
df.head()
df.to_excel('Appended Reports.xlsx')
So far I've managed to create an appended report that collects and appends all the Excel files in K:\\ExternalLinkValidationReports\\UK\\2020-01-06

However, I want to modify this so that it looks for the latest modified folder in K:\\ExternalLinkValidationReports\\UK\\
rather than having to specify the exact folder each time.

I wonder if anyone knows a quick way to modify my code so that it can do this?

Many thanks!

John
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Combining Spreadsheets Py3_14 4 591 Feb-07-2024, 04:58 PM
Last Post: deanhystad
  Converting a json file to a dataframe with rows and columns eyavuz21 13 4,376 Jan-29-2023, 03:59 PM
Last Post: eyavuz21
  merging rows arvin 2 822 Dec-24-2022, 04:56 AM
Last Post: arvin
  Extracting Data into Columns using pdfplumber arvin 17 5,485 Dec-17-2022, 11:59 AM
Last Post: arvin
  How to properly format rows and columns in excel data from parsed .txt blocks jh67 7 1,867 Dec-12-2022, 08:22 PM
Last Post: jh67
  Check DataFrames with different sorting in columns and rows Foxyskippy 0 771 Nov-19-2022, 07:49 AM
Last Post: Foxyskippy
  The code I have written removes the desired number of rows, but wrong rows Jdesi1983 0 1,624 Dec-08-2021, 04:42 AM
Last Post: Jdesi1983
  making variables in my columns and rows in python kronhamilton 2 1,608 Oct-31-2021, 10:38 AM
Last Post: snippsat
  rows from sql query need to write to a file as columns sjcsvatt 6 2,381 Oct-09-2021, 12:45 AM
Last Post: snippsat
  Summing up rows and columns plumberpy 3 2,255 Aug-18-2021, 05:46 AM
Last Post: naughtyCat

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020