Python Forum
Read a folder with a multiple files
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Read a folder with a multiple files
#1
Hi,

I have a list
Quote:Names = ['John', 'Tom']
if I say
 def ReadNames():
      for name in Names:
        print(name)
I get both names
Quote:John
Tom

How ever when I say
def ReadNames():
     for name in Names:
         return name
I only get one name back, how do I write a function that iterate through a list and give all values back?
Reply
#2
(May-03-2019, 11:17 AM)NewBeie Wrote: I only get one name back, how do I write a function that iterate through a list and give all values back?

I don't sure if I understood your exactly, but you are likely talking about generators?


def read_names():
    for name in Names:  # Names should be defined somewhere above
        yield name
names = read_names()
print(next(names))  # returns John
print(next(names))  # returns Tom
Reply
#3
(May-03-2019, 11:17 AM)NewBeie Wrote: How ever when I say
def ReadNames():
     for name in Names:
         return name
I only get one name back, how do I write a function that iterate through a list and give all values back?

After return statement function finishes and returns control to caller. So first element is returned and thats all what will happen.

Scidam provided code to overcome this problem. However, it's unclear for me why you want to have function to iterate over elements? Isin't it easier directly iterate over list? Especially when function is with no parameters.
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply
#4
I have a folder with two xml files (They could be more than two). I want to step in that folder and read each file, so I have this code:
path = os.getcwd() + '/Emails/'
files = os.listdir(path)
So now, the "files" returns a list, I want loop the files and read the context, so I tried this:
def Readfiles():
    for file in files:
        with open(file, 'r') as f:
            message = f.read()
            return message
But this is not giving me what I want. for each file context I get, I want to clean it, and I have a step for that:
soup = BeautifulSoup(message, 'lxml')
So what I want is for a code that will go through the folder and read each file, then pass the context to that step there for cleaning, then give me the output there of. Hope I made this clear

(May-03-2019, 11:34 AM)scidam Wrote:
(May-03-2019, 11:17 AM)NewBeie Wrote: I only get one name back, how do I write a function that iterate through a list and give all values back?

I don't sure if I understood your exactly, but you are likely talking about generators?


def read_names():
    for name in Names:  # Names should be defined somewhere above
        yield name
names = read_names()
print(next(names))  # returns John
print(next(names))  # returns Tom
Reply
#5
I have a folder with two xml files (They could be more than two). I want to step in that folder and read each file, so I do have this code:
path = os.getcwd() + '/XmlFiles/'
files = os.listdir(path)
So now, the "files" returns a list
print(files)
I want to loop the files and read the context, so I tried this:
def Readfiles():
    for file in files:
        with open(file, 'r') as f:
            message = f.read()
            return message
But this is not giving me what I want. For each file context, I want to clean it, and I have a step for that:
soup = BeautifulSoup(message, 'lxml')
So what I want is for a code that will go through the folder and read each file, then pass the context to the
BeautifulSoup function for cleaning, then give me the output there of, results for each file.
Reply
#6
Please read previous answers once again. You have got answer why you get only one name back and how to deal with it.

From lxml FAQ:

Quote:Take a look at the XML specification, it's all about byte sequences and how to map them to text and structure. That leads to rule number one: do not decode your XML data yourself. That's a part of the work of an XML parser, and it does it very well. Just pass it your data as a plain byte stream, it will always do the right thing, by specification.

This also includes not opening XML files in text mode. Make sure you always use binary mode, or, even better, pass the file path into lxml's parse() function to let it do the file opening, reading and closing itself. This is the most simple and most efficient way to do it.
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply
#7
The return statement is wrong.
As first the high-level solution:

from pathlib import Path


def read_files():
    root = Path.cwd() / 'XmlFiles'
    for file in root.glob('*.xml'):
        yield file.read_text()
        # yield file.read_bytes() # to get bytes
This function is a generator and works only, if you iterate over it or use a function/type which iterates implicit over the generator.
To get the data of all *.xml files:

file_data_as_list = list(read_files())
If you change the function a little bit, you can store the path as a key in a dict together with the text as value.

def read_files():
    root = Path.cwd() / 'XmlFiles'
    for file in root.glob('*.xml'):
        # yield (key, value)
        yield (file, file.read_text())

xml_content = dict(read_files())
Path.cwd() returns the absolute path, the resulting object during iteration, are also pathlib objects.
The pathlib object itself is not mutable. You can compare it to stings. Changing a path, results in a new path.

Your old version, corrected:

def read_files():
    result = []
    for file in files:
        with open(file, 'r') as f:
            message = f.read()
            result.append(message)
    return result
To get rid of the list inside the function, you can convert it to an generator:


def read_files():
    for file in files:
        with open(file, 'r') as f:
            yield f.read()
The object files should not accessed on global scope.
Use arguments for your functions. In this case the root-directory should be one argument of your function:


def read_files(files):
    for file in files:
        with open(file, 'r') as f:
            yield f.read()
I use generators often to explain things. Often lesser code is needed and it looks like what it does.
If you use a return statement somewhere in your function, you leave the function.
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#8
The answer above doesn't really help my situation, from the For Loop, I want to read each element in a loop. I could get 2 files or more, so
names = read_names()
print(next(names))  # returns John
print(next(names))  # returns Tom
Won't really help as they might be 40+ files. I want to iterate a list, for as many elements in the list, then for each element, read the context of it (files in a folder)

As for XML, I use
soup = BeautifulSoup(message, 'lxml')
to clean all the garbage, so there's not an issue here.

(May-06-2019, 07:42 AM)perfringo Wrote: Please read previous answers once again. You have got answer why you get only one name back and how to deal with it.

From lxml FAQ:

Quote:Take a look at the XML specification, it's all about byte sequences and how to map them to text and structure. That leads to rule number one: do not decode your XML data yourself. That's a part of the work of an XML parser, and it does it very well. Just pass it your data as a plain byte stream, it will always do the right thing, by specification.

This also includes not opening XML files in text mode. Make sure you always use binary mode, or, even better, pass the file path into lxml's parse() function to let it do the file opening, reading and closing itself. This is the most simple and most efficient way to do it.

This is what I've done so far:
 path = os.getcwd() + '/XmlFiles/'
files = os.listdir(path)
def Readfiles():
    for file in files:
        # print(file)
        with open(path+file, 'r') as f:
            message = f.read()
        return (message)

message = Readfiles()
soup = BeautifulSoup(message, 'lxml')
print(soup.text.strip())
What this does is, it goes to my folder, get a file, read it and prints it, however when I put the second file in my folder, I only get the results of the first file.I would like to get results for each file in the folder.

Thank you for the reply, it is helping a lot, I'm now close to get what I want, I used this code below:
path = os.getcwd() + '/XmlFiles/'
files = os.listdir(path)
# print(files)

def read_files():
    result = []
    for file in files:
        with open(path+file, 'r') as f:
            message = f.read()
            result.append(message)
    return(result)

message = (read_files())
print(message)
Which I do get two of my files returned in a list,
Output:
['ZZ~<ResponseCode>NoError</ResponseCode><Items><Message xmlns="http://schemas.micros~<?xml version="1.0" encoding="UTF-8"?><GetItemResponse xmlns="
I only copied part of the output, but I do get the whole 2 files, in the list format.

However when I try to apply
Quote:To get rid of the list inside the function, you can convert it to an generator:

def read_files():
    for file in files:
        with open(file, 'r') as f:
            yield f.read()

message = read_files()
print(message)

I'm getting this output
Output:
<generator object read_files at 0x00000182510E0A98> Process finished with exit code 0
I would like to get rid of the list inside the function, to get the results of my 2 files, separately, but not in a list.

For instance if I do this
def Readfiles():
    for file in files:
        print(file)
I'm getting
Output:
bi_sentiment_emails1.xml.xml bi_sentiment_emails2.xml.xml Process finished with exit code 0
This output is not a list, so I would like to get the same above, the output of the read files.

(May-06-2019, 07:53 AM)DeaD_EyE Wrote: The return statement is wrong.
As first the high-level solution:

from pathlib import Path


def read_files():
    root = Path.cwd() / 'XmlFiles'
    for file in root.glob('*.xml'):
        yield file.read_text()
        # yield file.read_bytes() # to get bytes
This function is a generator and works only, if you iterate over it or use a function/type which iterates implicit over the generator.
To get the data of all *.xml files:

file_data_as_list = list(read_files())
If you change the function a little bit, you can store the path as a key in a dict together with the text as value.

def read_files():
    root = Path.cwd() / 'XmlFiles'
    for file in root.glob('*.xml'):
        # yield (key, value)
        yield (file, file.read_text())

xml_content = dict(read_files())
Path.cwd() returns the absolute path, the resulting object during iteration, are also pathlib objects.
The pathlib object itself is not mutable. You can compare it to stings. Changing a path, results in a new path.

Your old version, corrected:

def read_files():
    result = []
    for file in files:
        with open(file, 'r') as f:
            message = f.read()
            result.append(message)
    return result
To get rid of the list inside the function, you can convert it to an generator:


def read_files():
    for file in files:
        with open(file, 'r') as f:
            yield f.read()
The object files should not accessed on global scope.
Use arguments for your functions. In this case the root-directory should be one argument of your function:


def read_files(files):
    for file in files:
        with open(file, 'r') as f:
            yield f.read()
I use generators often to explain things. Often lesser code is needed and it looks like what it does.
If you use a return statement somewhere in your function, you leave the function.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Compare folder A and subfolder B and display files that are in folder A but not in su Melcu54 3 560 Jan-05-2024, 05:16 PM
Last Post: Pedroski55
  python convert multiple files to multiple lists MCL169 6 1,566 Nov-25-2023, 05:31 AM
Last Post: Iqratech
  Rename files in a folder named using windows explorer hitoxman 3 752 Aug-02-2023, 04:08 PM
Last Post: deanhystad
  splitting file into multiple files by searching for string AlphaInc 2 905 Jul-01-2023, 10:35 PM
Last Post: Pedroski55
  Rename all files in a folder hitoxman 9 1,506 Jun-30-2023, 12:19 AM
Last Post: Pedroski55
  Merging multiple csv files with same X,Y,Z in each Auz_Pete 3 1,180 Feb-21-2023, 04:21 AM
Last Post: Auz_Pete
  unittest generates multiple files for each of my test case, how do I change to 1 file zsousa 0 974 Feb-15-2023, 05:34 PM
Last Post: zsousa
  How to read in mulitple files efficiently garynewport 3 894 Jan-27-2023, 10:44 AM
Last Post: DeaD_EyE
  Find duplicate files in multiple directories Pavel_47 9 3,138 Dec-27-2022, 04:47 PM
Last Post: deanhystad
  How to loop through all excel files and sheets in folder jadelola 1 4,512 Dec-01-2022, 06:12 PM
Last Post: deanhystad

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020