Python Forum

Full Version: Searching for URLs and printing the corresponding row from an Excel spreadsheet
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi,

So far I have created a script that identifies the last modified folder in a list, appends all the spreadsheets in it and then outputs a new appended Spreadsheet.

Now I want to search the new spreadsheet I have created for a set list of URLs that I have and where a match is found, print the row that contains the URL.

Here ismy script so far:
import os
import pandas as pd
import xlsxwriter
exportroot = r'\\atlas\\Knowhow\\ExternalLinkValidationReports\\UK'  # change this to your folder path
folderlist = [] # blank list of folders - we'll put the paths to the subfolders in this
# check that the folder exists and is actually a folder
if os.path.exists(exportroot) and os.path.isdir(exportroot):
    exportfolders = next(os.walk(exportroot))[1]  # get list of children - this will include folders *and* files
    for exportfolder in exportfolders:  # for each of those children………
        exppath = os.path.join(exportroot, exportfolder)  # define the full path………
        if os.path.isdir(exppath):  # and if that path defines a directory………
            folderlist.append(exppath)    # add it to our list
sortedlist = sorted(folderlist, key=lambda t: -os.stat(t).st_mtime)  # sort the list, most recent first
mostrecent = sortedlist[0]  # get the full path to the most recently modified subfolder from the sorted list
print(mostrecent)
os.chdir(mostrecent)  # Changing current working directory to "mostrecent" folder
path = os.getcwd() # Defining path as new cwd
print(path)
cwd = path # Defining cwd as path
files = os.listdir(cwd)
df = pd.DataFrame() #Importing data frame
for file in files:
        if file.endswith('.xlsx'):
            df = df.append(pd.read_excel(file), ignore_index=True) #Identifying Excel files in "mostrecent" folder
df.head() #Printing first 5 rows to test data
    
writer = pd.ExcelWriter(r'Appended_Reports.xlsx', engine='xlsxwriter',options={'strings_to_urls': False})
df.to_excel(writer)
writer.close() #Creates report - changes options to avoid 255 character URL limit in excel 
Has anyone done something similar and might be able to help? I've looked through openpyxl guidance but as I'm a newbie I'm struggling to find the function. It's the Appended_Reports.xlsx spreadsheet that I create at the end of the process that I want to search.

Any help greatly appreciated!

Thanks,

John