Aug-20-2021, 06:43 PM
Hi,
So far I have created a script that identifies the last modified folder in a list, appends all the spreadsheets in it and then outputs a new appended Spreadsheet.
Now I want to search the new spreadsheet I have created for a set list of URLs that I have and where a match is found, print the row that contains the URL.
Here ismy script so far:
Any help greatly appreciated!
Thanks,
John
So far I have created a script that identifies the last modified folder in a list, appends all the spreadsheets in it and then outputs a new appended Spreadsheet.
Now I want to search the new spreadsheet I have created for a set list of URLs that I have and where a match is found, print the row that contains the URL.
Here ismy script so far:
import os import pandas as pd import xlsxwriter exportroot = r'\\atlas\\Knowhow\\ExternalLinkValidationReports\\UK' # change this to your folder path folderlist = [] # blank list of folders - we'll put the paths to the subfolders in this # check that the folder exists and is actually a folder if os.path.exists(exportroot) and os.path.isdir(exportroot): exportfolders = next(os.walk(exportroot))[1] # get list of children - this will include folders *and* files for exportfolder in exportfolders: # for each of those children……… exppath = os.path.join(exportroot, exportfolder) # define the full path……… if os.path.isdir(exppath): # and if that path defines a directory……… folderlist.append(exppath) # add it to our list sortedlist = sorted(folderlist, key=lambda t: -os.stat(t).st_mtime) # sort the list, most recent first mostrecent = sortedlist[0] # get the full path to the most recently modified subfolder from the sorted list print(mostrecent) os.chdir(mostrecent) # Changing current working directory to "mostrecent" folder path = os.getcwd() # Defining path as new cwd print(path) cwd = path # Defining cwd as path files = os.listdir(cwd) df = pd.DataFrame() #Importing data frame for file in files: if file.endswith('.xlsx'): df = df.append(pd.read_excel(file), ignore_index=True) #Identifying Excel files in "mostrecent" folder df.head() #Printing first 5 rows to test data writer = pd.ExcelWriter(r'Appended_Reports.xlsx', engine='xlsxwriter',options={'strings_to_urls': False}) df.to_excel(writer) writer.close() #Creates report - changes options to avoid 255 character URL limit in excelHas anyone done something similar and might be able to help? I've looked through openpyxl guidance but as I'm a newbie I'm struggling to find the function. It's the Appended_Reports.xlsx spreadsheet that I create at the end of the process that I want to search.
Any help greatly appreciated!
Thanks,
John