Aug-20-2021, 06:43 PM
Hi,
So far I have created a script that identifies the last modified folder in a list, appends all the spreadsheets in it and then outputs a new appended Spreadsheet.
Now I want to search the new spreadsheet I have created for a set list of URLs that I have and where a match is found, print the row that contains the URL.
Here ismy script so far:
Has anyone done something similar and might be able to help? I've looked through openpyxl guidance but as I'm a newbie I'm struggling to find the function. It's the Appended_Reports.xlsx spreadsheet that I create at the end of the process that I want to search.
Any help greatly appreciated!
Thanks,
John
So far I have created a script that identifies the last modified folder in a list, appends all the spreadsheets in it and then outputs a new appended Spreadsheet.
Now I want to search the new spreadsheet I have created for a set list of URLs that I have and where a match is found, print the row that contains the URL.
Here ismy script so far:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
import os import pandas as pd import xlsxwriter exportroot = r '\\atlas\\Knowhow\\ExternalLinkValidationReports\\UK' # change this to your folder path folderlist = [] # blank list of folders - we'll put the paths to the subfolders in this # check that the folder exists and is actually a folder if os.path.exists(exportroot) and os.path.isdir(exportroot): exportfolders = next (os.walk(exportroot))[ 1 ] # get list of children - this will include folders *and* files for exportfolder in exportfolders: # for each of those children……… exppath = os.path.join(exportroot, exportfolder) # define the full path……… if os.path.isdir(exppath): # and if that path defines a directory……… folderlist.append(exppath) # add it to our list sortedlist = sorted (folderlist, key = lambda t: - os.stat(t).st_mtime) # sort the list, most recent first mostrecent = sortedlist[ 0 ] # get the full path to the most recently modified subfolder from the sorted list print (mostrecent) os.chdir(mostrecent) # Changing current working directory to "mostrecent" folder path = os.getcwd() # Defining path as new cwd print (path) cwd = path # Defining cwd as path files = os.listdir(cwd) df = pd.DataFrame() #Importing data frame for file in files: if file .endswith( '.xlsx' ): df = df.append(pd.read_excel( file ), ignore_index = True ) #Identifying Excel files in "mostrecent" folder df.head() #Printing first 5 rows to test data writer = pd.ExcelWriter(r 'Appended_Reports.xlsx' , engine = 'xlsxwriter' ,options = { 'strings_to_urls' : False }) df.to_excel(writer) writer.close() #Creates report - changes options to avoid 255 character URL limit in excel |
Any help greatly appreciated!
Thanks,
John