Function not executing each file in folder

mathew_31 · (This post was last modified: Aug-14-2022, 07:24 PM by Gribouillis.)

Hi forum I'm new at coding, I have created a code that reads in a .docx file. In that code I have a function that executes a search, assigns it to a variable and appends it to a list that then sends it to an excel. My problem is that the function is being called the number of .docx in a folder but this function runs the same word.docx instead of executing once for each word file. So my output is an excel with the same info twice (I have 2 word.docx files in the folder)

How do I fix this ? I have tried multiple codes without success. [attachment=1897][attachment=1897]

Gribouillis write Aug-14-2022, 07:24 PM:
Please post all code, output and errors (it it's entirety) between their respective tags. Refer to BBCode help topic on how to post. Use the "Preview Post" button to make sure the code is presented as you expect before hitting the "Post Reply/Thread" button.

**deanhystad** · Aug-14-2022, 08:57 PM

Your problem is that first you do this:

for f in os.listdir(path):
    if f.endswith('.docx'):
        files.append(f)

for i in range(len(files)):
    text = docx2txt.process(files[i])
    text2 = text.replace(":", " ")
    text3 = text2.replace(",", " ")
    text4 = text3.replace("_", " ")
    data = text4.split()

Then later on you do this:

#Sends vet list to string
for j in data:
	vet += j + ", "

Was it your plan for vet to concatenate the results for all the files? That is not what happens. Your program only uses data from the last docx file. You should combine finding, processing and appending into one loop. Like this:

vet = ""
for f in os.listdir(path):
    if f.endswith('.docx'):
        text = docx2txt.process(f)
        text = text.replace(":", " ")
        text = text.replace(",", " ")
        text = text.replace("_", " ")
        data = text.split()
        vet += ", ".join(data)

mathew_31 · (This post was last modified: Aug-14-2022, 11:41 PM by Larz60+.)

Okay thanks, no it wasn't my plan. I am still learning. I have replaced my code with what you came up :) But now for some reason, my search function isn't working for the second word.docx. It is giving me this: see picture attachment. Instead of only the words " Neveu Transport" in the excel.

I am guessing their is something wrong with the " vet += ", ".join(data)" ?

thank you for you help.

vet = ""
for f in os.listdir(path):
    if f.endswith('.docx'):
        text = docx2txt.process(f)
        text = text.replace(":", " ")
        text = text.replace(",", " ")
        text = text.replace("_", " ")
        data = text.split()
        vet += ", ".join(data)

[/quote]

Larz60+ write Aug-14-2022, 11:41 PM:
Please post all code, output and errors (it it's entirety) between their respective tags. Refer to BBCode help topic on how to post. Use the "Preview Post" button to make sure the code is presented as you expect before hitting the "Post Reply/Thread" button.

**deanhystad** · (This post was last modified: Aug-15-2022, 02:06 AM by deanhystad.)

That fixed the first error, only processing one file. There are more.

The next error is that all the docx results are appended to vet. Maybe you want to process each file independently? That would look like this:

for f in os.listdir(path):
    if f.endswith('.docx'):
        text = docx2txt.process(f).replace(":", " ").replace(",", " ").replace("_", " ")
        ma(", ".join(text.split()))

def ma(vet):
   ...

Please try to follow forum rules and post code by pasting into your post surrounded by Python tags.

mathew_31 · Aug-15-2022, 02:16 PM

Great! Everything is working now:)

thank you so much

**deanhystad** · Aug-15-2022, 03:46 PM

Your data should be organized as a list of lists, not 12 independent lists. I would have ma() return a row (list) and data1 would be a list of rows.
Something like this:

def my_match(string, pattern):
    """Find pattern in string.  Return first "group" stripped of commas"""
    match = re.search(pattern, string)
    if match:
        return match.group(1).replace(",", "")
    return ""

def ma(data):
    vet = ", ".join(data)
    return [
        my_match('Transport, (.*)Contact', vet),
        "",
        data[0],
        my_match('Date, (.*)Numéro', vet),
        my_match('Prix, (.*)Prix', vet),
        my_match('Compte, (.*)I.D', vet),
        my_match('DBS, (.*)Transport', vet),
        "",
        "",
        my_match('par, (.*)De', vet),
        my_match('De, (.*)À', vet),
        my_match('À, (.*)Attention', vet)
    ]

data1 = []
for f in os.listdir(path):
    if f.endswith('.docx'):
        text = docx2txt.process(f).replace(":", " ").replace(",", " ").replace("_", " ")
        data1.append(ma(text.split()))

columns = [
    "Transporteur",
    "#Fournisseur",
    "FT#",
    "Date ceuillette",
    "Prix",
    "GL",
    "PO#",
    "IMACS/CC/W/O",
    "Notes si requis",
    "Transport demandé par",
    "Origine",
    "Destination",
]

df1 = pd.DataFrame(data1, columns)

When you see yourself typing the same thing over and over:

    result = re.search('Transport, (.*)Contact', vet)
    result_1 = (result.group(1)).replace(",", "")
    Tra = result_1

Write a function.

def my_match(string, pattern):
    """Find pattern in string.  Return first "group" stripped of commas"""
    match = re.search(pattern, string)
    if match:
        return match.group(1).replace(",", "")
    return ""

The function reduces typing and chances for typing errors. The function body makes it easy to document the important processing that you are repeating over and over. The function makes it easy to add functionality. Here I check if a match is found and return an empty string if it isn't.

mathew_31 · (This post was last modified: Aug-22-2022, 06:05 PM by mathew_31.)

So im running into a problem, I want to make this code run every 10 seconds for example. For some reason python doesn't recognise the value "text" when inserted in a function.

 
import os
import docx2txt
import re
import pandas as pd
import numpy as np
import openpyxl
import time
import schedule

#variables
path = r"C:\Users\eschbachm\OneDrive - EXP\Desktop\test"
os.chdir(path)

#Colonne total
col1 = []
col2 = []
col3 = []
col4 = []
col5 = []
col6 = []
col7 = []
col8 = []
col9 = []
col10 = []
col11 = []
col12 = []
#lists
vet = ""
        
#Sends vet list to string
for j in vet:
	vet += j + ", "     

def ma(vet):
    #Colonne 1
    result = re.search('Transport, (.*)Contact', vet)
    result_1 = (result.group(1)).replace(",", "")
    Tra = result_1
    col1.append(Tra)
    #Colonne 2
    VQ = ''
    col2.append(VQ)
    #Colonne 3
    result = re.search('(.*)LOCATION', vet)
    result_3 = (result.group(1)).replace(",", "")
    FT = result_3
    col3.append(FT)
    #Colonne 4
    result = re.search('Date, (.*)Numéro', vet)
    result_4 = (result.group(1)).replace(",", "")
    Date = result_4
    col4.append(Date)
    #Colonne 5
    result = re.search('Prix, (.*)Modèle', vet)
    result_5 = (result.group(1)).replace(",", "")
    Prix = result_5
    col5.append(Prix)
    #Colonne 6
    result = re.search('Compte, (.*)Accessoires', vet) #recherche valeur de GL
    result_6 = (result.group(1)).replace(",", "")
    GL = result_6
    #Colonne 7
    result = re.search('DBS, (.*)Transport', vet) #recherche valeur de PO
    result_7 = (result.group(1)).replace(",", "")
    PO = result_7

    a = 0
    b = 0
    c = 0
    for line in text:
        # checking string is present in line or not
        if GL != "" and PO != "": #si Gl et PO sont present en meme temps
            a = 1
        elif GL != "": #si Gl est present et non PO
            b = 2
        elif PO != "": #si PO est present et non GL
            c = 3
            break
    if a == 0:
        pass
    else: #si Gl et PO sont present en meme temps
        col6.append(GL)
        col7.append(PO)
    if b == 0:
        pass
    else: #si Gl est present et non PO
        col6.append(GL)
        col7.append('')
    if c == 0:
        pass
    else: #si PO est present et non GL
        col6.append('')
        col7.append(PO)
    if a == 0 and b == 0 and c == 0:
        col6.append('')
        col7.append('')

    #Colonne 8
    IMACS = ''
    col8.append(IMACS)
    #Colonne 9
    Notes = ''
    col9.append(Notes)
    #Colonne 10
    result = re.search('par, (.*)De', vet) #recherche valeur de DP
    result_10 = (result.group(1)).replace(",", "")
    DP = result_10
    col10.append(DP)
    #Colonne 11
    result = re.search('De, (.*)À', vet) #recherche valeur de origine
    result_11 = (result.group(1)).replace(",", "")
    ORI = result_11
    col11.append(ORI)
    #Colonne 12
    result = re.search('À, (.*)Prix', vet) #recherche valeur de destination
    result_12 = (result.group(1)).replace(",", "")
    DEST = result_12
    col12.append(DEST)

def run():
        for f in os.listdir(path):
                if f.endswith('.docx'):
                        text = docx2txt.process(f).replace(":", " ").replace(",", " ").replace("_", " ")
                        ma(", ".join(text.split()))
                        print('Transfert de donnée Réussi!')

# Creating the first Dataframe using dictionary
data1 = {
    "Transporteur": col1,
    "#Fournisseur": col2,
    "FT#": col3,
    "Date ceuillette": col4,
    "Prix": col5,
    "GL": col6,
    "PO#": col7,
    "IMACS/CC/W/O": col8,
    "Notes si requis": col9,
    "Transport demandé par": col10,
    "Origine": col11,
    "Destination": col12}

df1 = pd.DataFrame(data=data1)
df1 = df1.sort_values('Date ceuillette', ascending=True)

# load df to existing excel
with pd.ExcelWriter('output.xlsx', mode='a', if_sheet_exists="replace") as writer:
    df1.to_excel(writer, sheet_name='Sheet_name1')

schedule.every(10).seconds.do(run)

while 1:
        schedule.run_pending()
        time.sleep(2)

Python gives me this error:
Traceback (most recent call last):
File "C:\Users\eschbachm\OneDrive - EXP\Desktop\code\CAT - Version Finale.py", line 149, in <module>
run()
File "C:\Users\eschbachm\OneDrive - EXP\Desktop\code\CAT - Version Finale.py", line 123, in run
ma(", ".join(text.split()))
File "C:\Users\eschbachm\OneDrive - EXP\Desktop\code\CAT - Version Finale.py", line 69, in ma
for line in text:
NameError: name 'text' is not defined. Did you mean: 'next'?

**deanhystad** · Aug-22-2022, 06:55 PM

Complaining that text is not defined is a valid complaint. There is no variable named "text" defined in ma() or in the global scope. There is a variable named "text" defined in run(), but that variable, like all local variables in a function, is not visible outside run().

Looking at you initial code, ma() used "data" and "vet". You got data by doing this:

    text = docx2txt.process(files[i])
    text2 = text.replace(":", " ")
    text3 = text2.replace(",", " ")
    text4 = text3.replace("_", " ")
    data = text4.split()

which is the same as:

text = docx2txt.process(f).replace(":", " ").replace(",", " ").replace("_", " ")

data = text.split()
And you got vet by doing this:

vet = ""
for j in data:
	vet += j + ", "

which is the same as this:

", ".join(text.split())

Since ma() needs both data and vet, and vet is easily created from data, I think it makes more sense to pass data to ma() and have ma() create vet.

def ma(data):
    vet = ", ".join(data)
    ...

def run():
        for f in os.listdir(path):
                if f.endswith('.docx'):
                        text = docx2txt.process(f).replace(":", " ").replace(",", " ").replace("_", " ")
                        ma(text.split())  # This creates data from text and passes it to ma
                        print('Transfert de donnée Réussi!')

mathew_31 · (This post was last modified: Aug-22-2022, 08:33 PM by mathew_31.)

Hi thanks alot for helping me in your free time,

I am still not understanding this:

 
def ma(data):
    vet = ", ".join(data)
    ...
    for line in text:
        # checking string is present in line or not
    ...
def run():
        for f in os.listdir(path):
                if f.endswith('.docx'):
                        text = docx2txt.process(f).replace(":", " ").replace(",", " ").replace("_", " ")
                        ma(text.split())
                        print('Transfert de donnée Réussi!')

python is still throwing me a code because "text" in ma() is not defined.
How can I link the two "text" variables in both functions?
I have made the changes you suggested to me but still not succesfull :(

**deanhystad** · Aug-22-2022, 08:40 PM

That is because there is no "text" in ma(). Look at the link to your code in your first post. In that code ma() does not use "text" anywhere, it uses "data". The only difference is that now instead of using global variables you are passing "data" as an argument to ma(data) and inside ma(data) you create "vet".

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Compare folder A and subfolder B and display files that are in folder A but not in su	Melcu54	3	545	Jan-05-2024, 05:16 PM Last Post: Pedroski55
	Reading a file name fron a folder on my desktop	Fiona	4	916	Aug-23-2023, 11:11 AM Last Post: Axel_Erfurt
	Dynamic File Name to a shared folder with open command in python	sjcsvatt	9	6,041	Jan-07-2022, 04:55 PM Last Post: bowlofred
	Code to check folder and sub folders for new file and alert	fioranosnake	2	1,938	Jan-06-2022, 05:03 PM Last Post: deanhystad
	Compare filename with folder name and copy matching files into a particular folder	shantanu97	2	4,481	Dec-18-2021, 09:32 PM Last Post: Larz60+
	How to import file and function in another folder	SriRajesh	1	3,161	Dec-18-2021, 08:35 AM Last Post: Gribouillis
	How to run an exe file in the Scripts folder using py.exe?	quazirfan	2	2,956	Sep-08-2021, 01:00 AM Last Post: quazirfan
	Move file from one folder to another folder with timestamp added end of file	shantanu97	0	2,475	Mar-22-2021, 10:59 AM Last Post: shantanu97
	executing a bash file - revisited	ebolisa	7	2,893	Feb-10-2021, 08:05 PM Last Post: Gribouillis
	Writing to file in a specific folder	evapa8f	5	3,417	Nov-13-2020, 10:10 PM Last Post: deanhystad

Function not executing each file in folder

User Panel Messages

Announcements