Python Forum
Function not executing each file in folder
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Function not executing each file in folder
#1
Hi forum I'm new at coding, I have created a code that reads in a .docx file. In that code I have a function that executes a search, assigns it to a variable and appends it to a list that then sends it to an excel. My problem is that the function is being called the number of .docx in a folder but this function runs the same word.docx instead of executing once for each word file. So my output is an excel with the same info twice (I have 2 word.docx files in the folder)

How do I fix this ? I have tried multiple codes without success. [attachment=1897][attachment=1897]
Gribouillis write Aug-14-2022, 07:24 PM:
Please post all code, output and errors (it it's entirety) between their respective tags. Refer to BBCode help topic on how to post. Use the "Preview Post" button to make sure the code is presented as you expect before hitting the "Post Reply/Thread" button.

Attached Files

.py   CAT.py (Size: 3.8 KB / Downloads: 32)
Reply
#2
Your problem is that first you do this:
for f in os.listdir(path):
    if f.endswith('.docx'):
        files.append(f)

for i in range(len(files)):
    text = docx2txt.process(files[i])
    text2 = text.replace(":", " ")
    text3 = text2.replace(",", " ")
    text4 = text3.replace("_", " ")
    data = text4.split()
Then later on you do this:
#Sends vet list to string
for j in data:
	vet += j + ", "
Was it your plan for vet to concatenate the results for all the files? That is not what happens. Your program only uses data from the last docx file. You should combine finding, processing and appending into one loop. Like this:
vet = ""
for f in os.listdir(path):
    if f.endswith('.docx'):
        text = docx2txt.process(f)
        text = text.replace(":", " ")
        text = text.replace(",", " ")
        text = text.replace("_", " ")
        data = text.split()
        vet += ", ".join(data)
Reply
#3
Okay thanks, no it wasn't my plan. I am still learning. I have replaced my code with what you came up :) But now for some reason, my search function isn't working for the second word.docx. It is giving me this: see picture attachment. Instead of only the words " Neveu Transport" in the excel.

I am guessing their is something wrong with the " vet += ", ".join(data)" ?

thank you for you help.

vet = ""
for f in os.listdir(path):
    if f.endswith('.docx'):
        text = docx2txt.process(f)
        text = text.replace(":", " ")
        text = text.replace(",", " ")
        text = text.replace("_", " ")
        data = text.split()
        vet += ", ".join(data)
[/quote]
Larz60+ write Aug-14-2022, 11:41 PM:
Please post all code, output and errors (it it's entirety) between their respective tags. Refer to BBCode help topic on how to post. Use the "Preview Post" button to make sure the code is presented as you expect before hitting the "Post Reply/Thread" button.

Attached Files

Thumbnail(s)
   

.py   CAT - Copy 2.py (Size: 3.65 KB / Downloads: 11)
Reply
#4
That fixed the first error, only processing one file. There are more.

The next error is that all the docx results are appended to vet. Maybe you want to process each file independently? That would look like this:
for f in os.listdir(path):
    if f.endswith('.docx'):
        text = docx2txt.process(f).replace(":", " ").replace(",", " ").replace("_", " ")
        ma(", ".join(text.split()))

def ma(vet):
   ...
Please try to follow forum rules and post code by pasting into your post surrounded by Python tags.
Reply
#5
Great! Everything is working now:)

thank you so much
Reply
#6
Your data should be organized as a list of lists, not 12 independent lists. I would have ma() return a row (list) and data1 would be a list of rows.
Something like this:
def my_match(string, pattern):
    """Find pattern in string.  Return first "group" stripped of commas"""
    match = re.search(pattern, string)
    if match:
        return match.group(1).replace(",", "")
    return ""

def ma(data):
    vet = ", ".join(data)
    return [
        my_match('Transport, (.*)Contact', vet),
        "",
        data[0],
        my_match('Date, (.*)Numéro', vet),
        my_match('Prix, (.*)Prix', vet),
        my_match('Compte, (.*)I.D', vet),
        my_match('DBS, (.*)Transport', vet),
        "",
        "",
        my_match('par, (.*)De', vet),
        my_match('De, (.*)À', vet),
        my_match('À, (.*)Attention', vet)
    ]

data1 = []
for f in os.listdir(path):
    if f.endswith('.docx'):
        text = docx2txt.process(f).replace(":", " ").replace(",", " ").replace("_", " ")
        data1.append(ma(text.split()))

columns = [
    "Transporteur",
    "#Fournisseur",
    "FT#",
    "Date ceuillette",
    "Prix",
    "GL",
    "PO#",
    "IMACS/CC/W/O",
    "Notes si requis",
    "Transport demandé par",
    "Origine",
    "Destination",
]

df1 = pd.DataFrame(data1, columns)
When you see yourself typing the same thing over and over:
    result = re.search('Transport, (.*)Contact', vet)
    result_1 = (result.group(1)).replace(",", "")
    Tra = result_1
Write a function.
def my_match(string, pattern):
    """Find pattern in string.  Return first "group" stripped of commas"""
    match = re.search(pattern, string)
    if match:
        return match.group(1).replace(",", "")
    return ""
The function reduces typing and chances for typing errors. The function body makes it easy to document the important processing that you are repeating over and over. The function makes it easy to add functionality. Here I check if a match is found and return an empty string if it isn't.
mathew_31 likes this post
Reply
#7
So im running into a problem, I want to make this code run every 10 seconds for example. For some reason python doesn't recognise the value "text" when inserted in a function.
 
import os
import docx2txt
import re
import pandas as pd
import numpy as np
import openpyxl
import time
import schedule

#variables
path = r"C:\Users\eschbachm\OneDrive - EXP\Desktop\test"
os.chdir(path)

#Colonne total
col1 = []
col2 = []
col3 = []
col4 = []
col5 = []
col6 = []
col7 = []
col8 = []
col9 = []
col10 = []
col11 = []
col12 = []
#lists
vet = ""
        
#Sends vet list to string
for j in vet:
	vet += j + ", "     

def ma(vet):
    #Colonne 1
    result = re.search('Transport, (.*)Contact', vet)
    result_1 = (result.group(1)).replace(",", "")
    Tra = result_1
    col1.append(Tra)
    #Colonne 2
    VQ = ''
    col2.append(VQ)
    #Colonne 3
    result = re.search('(.*)LOCATION', vet)
    result_3 = (result.group(1)).replace(",", "")
    FT = result_3
    col3.append(FT)
    #Colonne 4
    result = re.search('Date, (.*)Numéro', vet)
    result_4 = (result.group(1)).replace(",", "")
    Date = result_4
    col4.append(Date)
    #Colonne 5
    result = re.search('Prix, (.*)Modèle', vet)
    result_5 = (result.group(1)).replace(",", "")
    Prix = result_5
    col5.append(Prix)
    #Colonne 6
    result = re.search('Compte, (.*)Accessoires', vet) #recherche valeur de GL
    result_6 = (result.group(1)).replace(",", "")
    GL = result_6
    #Colonne 7
    result = re.search('DBS, (.*)Transport', vet) #recherche valeur de PO
    result_7 = (result.group(1)).replace(",", "")
    PO = result_7

    a = 0
    b = 0
    c = 0
    for line in text:
        # checking string is present in line or not
        if GL != "" and PO != "": #si Gl et PO sont present en meme temps
            a = 1
        elif GL != "": #si Gl est present et non PO
            b = 2
        elif PO != "": #si PO est present et non GL
            c = 3
            break
    if a == 0:
        pass
    else: #si Gl et PO sont present en meme temps
        col6.append(GL)
        col7.append(PO)
    if b == 0:
        pass
    else: #si Gl est present et non PO
        col6.append(GL)
        col7.append('')
    if c == 0:
        pass
    else: #si PO est present et non GL
        col6.append('')
        col7.append(PO)
    if a == 0 and b == 0 and c == 0:
        col6.append('')
        col7.append('')

    #Colonne 8
    IMACS = ''
    col8.append(IMACS)
    #Colonne 9
    Notes = ''
    col9.append(Notes)
    #Colonne 10
    result = re.search('par, (.*)De', vet) #recherche valeur de DP
    result_10 = (result.group(1)).replace(",", "")
    DP = result_10
    col10.append(DP)
    #Colonne 11
    result = re.search('De, (.*)À', vet) #recherche valeur de origine
    result_11 = (result.group(1)).replace(",", "")
    ORI = result_11
    col11.append(ORI)
    #Colonne 12
    result = re.search('À, (.*)Prix', vet) #recherche valeur de destination
    result_12 = (result.group(1)).replace(",", "")
    DEST = result_12
    col12.append(DEST)

def run():
        for f in os.listdir(path):
                if f.endswith('.docx'):
                        text = docx2txt.process(f).replace(":", " ").replace(",", " ").replace("_", " ")
                        ma(", ".join(text.split()))
                        print('Transfert de donnée Réussi!')

# Creating the first Dataframe using dictionary
data1 = {
    "Transporteur": col1,
    "#Fournisseur": col2,
    "FT#": col3,
    "Date ceuillette": col4,
    "Prix": col5,
    "GL": col6,
    "PO#": col7,
    "IMACS/CC/W/O": col8,
    "Notes si requis": col9,
    "Transport demandé par": col10,
    "Origine": col11,
    "Destination": col12}

df1 = pd.DataFrame(data=data1)
df1 = df1.sort_values('Date ceuillette', ascending=True)

# load df to existing excel
with pd.ExcelWriter('output.xlsx', mode='a', if_sheet_exists="replace") as writer:
    df1.to_excel(writer, sheet_name='Sheet_name1')

schedule.every(10).seconds.do(run)

while 1:
        schedule.run_pending()
        time.sleep(2)
Python gives me this error:
Traceback (most recent call last):
File "C:\Users\eschbachm\OneDrive - EXP\Desktop\code\CAT - Version Finale.py", line 149, in <module>
run()
File "C:\Users\eschbachm\OneDrive - EXP\Desktop\code\CAT - Version Finale.py", line 123, in run
ma(", ".join(text.split()))
File "C:\Users\eschbachm\OneDrive - EXP\Desktop\code\CAT - Version Finale.py", line 69, in ma
for line in text:
NameError: name 'text' is not defined. Did you mean: 'next'?
Reply
#8
Complaining that text is not defined is a valid complaint. There is no variable named "text" defined in ma() or in the global scope. There is a variable named "text" defined in run(), but that variable, like all local variables in a function, is not visible outside run().

Looking at you initial code, ma() used "data" and "vet". You got data by doing this:
    text = docx2txt.process(files[i])
    text2 = text.replace(":", " ")
    text3 = text2.replace(",", " ")
    text4 = text3.replace("_", " ")
    data = text4.split()
which is the same as:
text = docx2txt.process(f).replace(":", " ").replace(",", " ").replace("_", " ")
data = text.split()
And you got vet by doing this:
vet = ""
for j in data:
	vet += j + ", " 
which is the same as this:
", ".join(text.split())
Since ma() needs both data and vet, and vet is easily created from data, I think it makes more sense to pass data to ma() and have ma() create vet.
def ma(data):
    vet = ", ".join(data)
    ...

def run():
        for f in os.listdir(path):
                if f.endswith('.docx'):
                        text = docx2txt.process(f).replace(":", " ").replace(",", " ").replace("_", " ")
                        ma(text.split())  # This creates data from text and passes it to ma
                        print('Transfert de donnée Réussi!')
Reply
#9
Hi thanks alot for helping me in your free time,

I am still not understanding this:
 
def ma(data):
    vet = ", ".join(data)
    ...
    for line in text:
        # checking string is present in line or not
    ...
def run():
        for f in os.listdir(path):
                if f.endswith('.docx'):
                        text = docx2txt.process(f).replace(":", " ").replace(",", " ").replace("_", " ")
                        ma(text.split())
                        print('Transfert de donnée Réussi!')
python is still throwing me a code because "text" in ma() is not defined.
How can I link the two "text" variables in both functions?
I have made the changes you suggested to me but still not succesfull :(
Reply
#10
That is because there is no "text" in ma(). Look at the link to your code in your first post. In that code ma() does not use "text" anywhere, it uses "data". The only difference is that now instead of using global variables you are passing "data" as an argument to ma(data) and inside ma(data) you create "vet".
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Dynamic File Name to a shared folder with open command in python sjcsvatt 9 2,239 Jan-07-2022, 04:55 PM
Last Post: bowlofred
  Code to check folder and sub folders for new file and alert fioranosnake 2 765 Jan-06-2022, 05:03 PM
Last Post: deanhystad
  Compare filename with folder name and copy matching files into a particular folder shantanu97 2 2,061 Dec-18-2021, 09:32 PM
Last Post: Larz60+
  How to import file and function in another folder SriRajesh 1 1,177 Dec-18-2021, 08:35 AM
Last Post: Gribouillis
  How to run an exe file in the Scripts folder using py.exe? quazirfan 2 1,801 Sep-08-2021, 01:00 AM
Last Post: quazirfan
  Move file from one folder to another folder with timestamp added end of file shantanu97 0 1,718 Mar-22-2021, 10:59 AM
Last Post: shantanu97
  executing a bash file - revisited ebolisa 7 1,849 Feb-10-2021, 08:05 PM
Last Post: Gribouillis
  Writing to file in a specific folder evapa8f 5 2,323 Nov-13-2020, 10:10 PM
Last Post: deanhystad
  Python Cut/Copy paste file from folder to another folder rdDrp 4 3,582 Aug-19-2020, 12:40 PM
Last Post: rdDrp
  make a list of the file in the folder and change the name of file regarding to time go127a 5 2,080 Feb-21-2020, 10:36 AM
Last Post: go127a

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020