Function not executing each file in folder - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: Function not executing each file in folder (/thread-37957.html) |
Function not executing each file in folder - mathew_31 - Aug-14-2022 Hi forum I'm new at coding, I have created a code that reads in a .docx file. In that code I have a function that executes a search, assigns it to a variable and appends it to a list that then sends it to an excel. My problem is that the function is being called the number of .docx in a folder but this function runs the same word.docx instead of executing once for each word file. So my output is an excel with the same info twice (I have 2 word.docx files in the folder) How do I fix this ? I have tried multiple codes without success. [attachment=1897][attachment=1897] RE: Function not executing each file in folder - deanhystad - Aug-14-2022 Your problem is that first you do this: for f in os.listdir(path): if f.endswith('.docx'): files.append(f) for i in range(len(files)): text = docx2txt.process(files[i]) text2 = text.replace(":", " ") text3 = text2.replace(",", " ") text4 = text3.replace("_", " ") data = text4.split()Then later on you do this: #Sends vet list to string for j in data: vet += j + ", "Was it your plan for vet to concatenate the results for all the files? That is not what happens. Your program only uses data from the last docx file. You should combine finding, processing and appending into one loop. Like this: vet = "" for f in os.listdir(path): if f.endswith('.docx'): text = docx2txt.process(f) text = text.replace(":", " ") text = text.replace(",", " ") text = text.replace("_", " ") data = text.split() vet += ", ".join(data) RE: Function not executing each file in folder - mathew_31 - Aug-14-2022 Okay thanks, no it wasn't my plan. I am still learning. I have replaced my code with what you came up :) But now for some reason, my search function isn't working for the second word.docx. It is giving me this: see picture attachment. Instead of only the words " Neveu Transport" in the excel. I am guessing their is something wrong with the " vet += ", ".join(data)" ? thank you for you help. vet = "" for f in os.listdir(path): if f.endswith('.docx'): text = docx2txt.process(f) text = text.replace(":", " ") text = text.replace(",", " ") text = text.replace("_", " ") data = text.split() vet += ", ".join(data)[/quote] RE: Function not executing each file in folder - deanhystad - Aug-15-2022 That fixed the first error, only processing one file. There are more. The next error is that all the docx results are appended to vet. Maybe you want to process each file independently? That would look like this: for f in os.listdir(path): if f.endswith('.docx'): text = docx2txt.process(f).replace(":", " ").replace(",", " ").replace("_", " ") ma(", ".join(text.split())) def ma(vet): ...Please try to follow forum rules and post code by pasting into your post surrounded by Python tags. RE: Function not executing each file in folder - mathew_31 - Aug-15-2022 Great! Everything is working now:) thank you so much RE: Function not executing each file in folder - deanhystad - Aug-15-2022 Your data should be organized as a list of lists, not 12 independent lists. I would have ma() return a row (list) and data1 would be a list of rows. Something like this: def my_match(string, pattern): """Find pattern in string. Return first "group" stripped of commas""" match = re.search(pattern, string) if match: return match.group(1).replace(",", "") return "" def ma(data): vet = ", ".join(data) return [ my_match('Transport, (.*)Contact', vet), "", data[0], my_match('Date, (.*)Numéro', vet), my_match('Prix, (.*)Prix', vet), my_match('Compte, (.*)I.D', vet), my_match('DBS, (.*)Transport', vet), "", "", my_match('par, (.*)De', vet), my_match('De, (.*)À', vet), my_match('À, (.*)Attention', vet) ] data1 = [] for f in os.listdir(path): if f.endswith('.docx'): text = docx2txt.process(f).replace(":", " ").replace(",", " ").replace("_", " ") data1.append(ma(text.split())) columns = [ "Transporteur", "#Fournisseur", "FT#", "Date ceuillette", "Prix", "GL", "PO#", "IMACS/CC/W/O", "Notes si requis", "Transport demandé par", "Origine", "Destination", ] df1 = pd.DataFrame(data1, columns)When you see yourself typing the same thing over and over: result = re.search('Transport, (.*)Contact', vet) result_1 = (result.group(1)).replace(",", "") Tra = result_1Write a function. def my_match(string, pattern): """Find pattern in string. Return first "group" stripped of commas""" match = re.search(pattern, string) if match: return match.group(1).replace(",", "") return ""The function reduces typing and chances for typing errors. The function body makes it easy to document the important processing that you are repeating over and over. The function makes it easy to add functionality. Here I check if a match is found and return an empty string if it isn't. RE: Function not executing each file in folder - mathew_31 - Aug-22-2022 So im running into a problem, I want to make this code run every 10 seconds for example. For some reason python doesn't recognise the value "text" when inserted in a function. import os import docx2txt import re import pandas as pd import numpy as np import openpyxl import time import schedule #variables path = r"C:\Users\eschbachm\OneDrive - EXP\Desktop\test" os.chdir(path) #Colonne total col1 = [] col2 = [] col3 = [] col4 = [] col5 = [] col6 = [] col7 = [] col8 = [] col9 = [] col10 = [] col11 = [] col12 = [] #lists vet = "" #Sends vet list to string for j in vet: vet += j + ", " def ma(vet): #Colonne 1 result = re.search('Transport, (.*)Contact', vet) result_1 = (result.group(1)).replace(",", "") Tra = result_1 col1.append(Tra) #Colonne 2 VQ = '' col2.append(VQ) #Colonne 3 result = re.search('(.*)LOCATION', vet) result_3 = (result.group(1)).replace(",", "") FT = result_3 col3.append(FT) #Colonne 4 result = re.search('Date, (.*)Numéro', vet) result_4 = (result.group(1)).replace(",", "") Date = result_4 col4.append(Date) #Colonne 5 result = re.search('Prix, (.*)Modèle', vet) result_5 = (result.group(1)).replace(",", "") Prix = result_5 col5.append(Prix) #Colonne 6 result = re.search('Compte, (.*)Accessoires', vet) #recherche valeur de GL result_6 = (result.group(1)).replace(",", "") GL = result_6 #Colonne 7 result = re.search('DBS, (.*)Transport', vet) #recherche valeur de PO result_7 = (result.group(1)).replace(",", "") PO = result_7 a = 0 b = 0 c = 0 for line in text: # checking string is present in line or not if GL != "" and PO != "": #si Gl et PO sont present en meme temps a = 1 elif GL != "": #si Gl est present et non PO b = 2 elif PO != "": #si PO est present et non GL c = 3 break if a == 0: pass else: #si Gl et PO sont present en meme temps col6.append(GL) col7.append(PO) if b == 0: pass else: #si Gl est present et non PO col6.append(GL) col7.append('') if c == 0: pass else: #si PO est present et non GL col6.append('') col7.append(PO) if a == 0 and b == 0 and c == 0: col6.append('') col7.append('') #Colonne 8 IMACS = '' col8.append(IMACS) #Colonne 9 Notes = '' col9.append(Notes) #Colonne 10 result = re.search('par, (.*)De', vet) #recherche valeur de DP result_10 = (result.group(1)).replace(",", "") DP = result_10 col10.append(DP) #Colonne 11 result = re.search('De, (.*)À', vet) #recherche valeur de origine result_11 = (result.group(1)).replace(",", "") ORI = result_11 col11.append(ORI) #Colonne 12 result = re.search('À, (.*)Prix', vet) #recherche valeur de destination result_12 = (result.group(1)).replace(",", "") DEST = result_12 col12.append(DEST) def run(): for f in os.listdir(path): if f.endswith('.docx'): text = docx2txt.process(f).replace(":", " ").replace(",", " ").replace("_", " ") ma(", ".join(text.split())) print('Transfert de donnée Réussi!') # Creating the first Dataframe using dictionary data1 = { "Transporteur": col1, "#Fournisseur": col2, "FT#": col3, "Date ceuillette": col4, "Prix": col5, "GL": col6, "PO#": col7, "IMACS/CC/W/O": col8, "Notes si requis": col9, "Transport demandé par": col10, "Origine": col11, "Destination": col12} df1 = pd.DataFrame(data=data1) df1 = df1.sort_values('Date ceuillette', ascending=True) # load df to existing excel with pd.ExcelWriter('output.xlsx', mode='a', if_sheet_exists="replace") as writer: df1.to_excel(writer, sheet_name='Sheet_name1') schedule.every(10).seconds.do(run) while 1: schedule.run_pending() time.sleep(2)Python gives me this error: Traceback (most recent call last): File "C:\Users\eschbachm\OneDrive - EXP\Desktop\code\CAT - Version Finale.py", line 149, in <module> run() File "C:\Users\eschbachm\OneDrive - EXP\Desktop\code\CAT - Version Finale.py", line 123, in run ma(", ".join(text.split())) File "C:\Users\eschbachm\OneDrive - EXP\Desktop\code\CAT - Version Finale.py", line 69, in ma for line in text: NameError: name 'text' is not defined. Did you mean: 'next'? RE: Function not executing each file in folder - deanhystad - Aug-22-2022 Complaining that text is not defined is a valid complaint. There is no variable named "text" defined in ma() or in the global scope. There is a variable named "text" defined in run(), but that variable, like all local variables in a function, is not visible outside run(). Looking at you initial code, ma() used "data" and "vet". You got data by doing this: text = docx2txt.process(files[i]) text2 = text.replace(":", " ") text3 = text2.replace(",", " ") text4 = text3.replace("_", " ") data = text4.split()which is the same as: text = docx2txt.process(f).replace(":", " ").replace(",", " ").replace("_", " ")data = text.split() And you got vet by doing this: vet = "" for j in data: vet += j + ", "which is the same as this: ", ".join(text.split())Since ma() needs both data and vet, and vet is easily created from data, I think it makes more sense to pass data to ma() and have ma() create vet. def ma(data): vet = ", ".join(data) ... def run(): for f in os.listdir(path): if f.endswith('.docx'): text = docx2txt.process(f).replace(":", " ").replace(",", " ").replace("_", " ") ma(text.split()) # This creates data from text and passes it to ma print('Transfert de donnée Réussi!') RE: Function not executing each file in folder - mathew_31 - Aug-22-2022 Hi thanks alot for helping me in your free time, I am still not understanding this: def ma(data): vet = ", ".join(data) ... for line in text: # checking string is present in line or not ... def run(): for f in os.listdir(path): if f.endswith('.docx'): text = docx2txt.process(f).replace(":", " ").replace(",", " ").replace("_", " ") ma(text.split()) print('Transfert de donnée Réussi!')python is still throwing me a code because "text" in ma() is not defined. How can I link the two "text" variables in both functions? I have made the changes you suggested to me but still not succesfull :( RE: Function not executing each file in folder - deanhystad - Aug-22-2022 That is because there is no "text" in ma(). Look at the link to your code in your first post. In that code ma() does not use "text" anywhere, it uses "data". The only difference is that now instead of using global variables you are passing "data" as an argument to ma(data) and inside ma(data) you create "vet". |