Aug-15-2022, 03:46 PM
Your data should be organized as a list of lists, not 12 independent lists. I would have ma() return a row (list) and data1 would be a list of rows.
Something like this:
Something like this:
def my_match(string, pattern): """Find pattern in string. Return first "group" stripped of commas""" match = re.search(pattern, string) if match: return match.group(1).replace(",", "") return "" def ma(data): vet = ", ".join(data) return [ my_match('Transport, (.*)Contact', vet), "", data[0], my_match('Date, (.*)Numéro', vet), my_match('Prix, (.*)Prix', vet), my_match('Compte, (.*)I.D', vet), my_match('DBS, (.*)Transport', vet), "", "", my_match('par, (.*)De', vet), my_match('De, (.*)À', vet), my_match('À, (.*)Attention', vet) ] data1 = [] for f in os.listdir(path): if f.endswith('.docx'): text = docx2txt.process(f).replace(":", " ").replace(",", " ").replace("_", " ") data1.append(ma(text.split())) columns = [ "Transporteur", "#Fournisseur", "FT#", "Date ceuillette", "Prix", "GL", "PO#", "IMACS/CC/W/O", "Notes si requis", "Transport demandé par", "Origine", "Destination", ] df1 = pd.DataFrame(data1, columns)When you see yourself typing the same thing over and over:
result = re.search('Transport, (.*)Contact', vet) result_1 = (result.group(1)).replace(",", "") Tra = result_1Write a function.
def my_match(string, pattern): """Find pattern in string. Return first "group" stripped of commas""" match = re.search(pattern, string) if match: return match.group(1).replace(",", "") return ""The function reduces typing and chances for typing errors. The function body makes it easy to document the important processing that you are repeating over and over. The function makes it easy to add functionality. Here I check if a match is found and return an empty string if it isn't.