Python Forum - Read a folder with a multiple files

Hi,

I have a list

Quote:Names = ['John', 'Tom']

if I say

 def ReadNames():
      for name in Names:
        print(name)

I get both names

Quote:John
Tom

How ever when I say

def ReadNames():
     for name in Names:
         return name

I only get one name back, how do I write a function that iterate through a list and give all values back?

(May-03-2019, 11:17 AM)NewBeie Wrote: [ -> ]I only get one name back, how do I write a function that iterate through a list and give all values back?

I don't sure if I understood your exactly, but you are likely talking about generators?

def read_names():
    for name in Names:  # Names should be defined somewhere above
        yield name

names = read_names()
print(next(names))  # returns John
print(next(names))  # returns Tom

(May-03-2019, 11:17 AM)NewBeie Wrote: [ -> ]How ever when I say
def ReadNames():
     for name in Names:
         return name
I only get one name back, how do I write a function that iterate through a list and give all values back?

After return statement function finishes and returns control to caller. So first element is returned and thats all what will happen.

Scidam provided code to overcome this problem. However, it's unclear for me why you want to have function to iterate over elements? Isin't it easier directly iterate over list? Especially when function is with no parameters.

I have a folder with two xml files (They could be more than two). I want to step in that folder and read each file, so I have this code:

path = os.getcwd() + '/Emails/'
files = os.listdir(path)

So now, the "files" returns a list, I want loop the files and read the context, so I tried this:

def Readfiles():
    for file in files:
        with open(file, 'r') as f:
            message = f.read()
            return message

But this is not giving me what I want. for each file context I get, I want to clean it, and I have a step for that:

soup = BeautifulSoup(message, 'lxml')

So what I want is for a code that will go through the folder and read each file, then pass the context to that step there for cleaning, then give me the output there of. Hope I made this clear

(May-03-2019, 11:34 AM)scidam Wrote: [ -> ]
(May-03-2019, 11:17 AM)NewBeie Wrote: [ -> ]I only get one name back, how do I write a function that iterate through a list and give all values back?

I don't sure if I understood your exactly, but you are likely talking about generators?
def read_names():
    for name in Names:  # Names should be defined somewhere above
        yield name
names = read_names()
print(next(names))  # returns John
print(next(names))  # returns Tom

I have a folder with two xml files (They could be more than two). I want to step in that folder and read each file, so I do have this code:

path = os.getcwd() + '/XmlFiles/'
files = os.listdir(path)

So now, the "files" returns a list

print(files)

I want to loop the files and read the context, so I tried this:

def Readfiles():
    for file in files:
        with open(file, 'r') as f:
            message = f.read()
            return message

But this is not giving me what I want. For each file context, I want to clean it, and I have a step for that:

soup = BeautifulSoup(message, 'lxml')

So what I want is for a code that will go through the folder and read each file, then pass the context to the
BeautifulSoup function for cleaning, then give me the output there of, results for each file.

Please read previous answers once again. You have got answer why you get only one name back and how to deal with it.

From lxml FAQ:

Quote:Take a look at the XML specification, it's all about byte sequences and how to map them to text and structure. That leads to rule number one: do not decode your XML data yourself. That's a part of the work of an XML parser, and it does it very well. Just pass it your data as a plain byte stream, it will always do the right thing, by specification.

This also includes not opening XML files in text mode. Make sure you always use binary mode, or, even better, pass the file path into lxml's parse() function to let it do the file opening, reading and closing itself. This is the most simple and most efficient way to do it.

The return statement is wrong.
As first the high-level solution:

from pathlib import Path


def read_files():
    root = Path.cwd() / 'XmlFiles'
    for file in root.glob('*.xml'):
        yield file.read_text()
        # yield file.read_bytes() # to get bytes

This function is a generator and works only, if you iterate over it or use a function/type which iterates implicit over the generator.
To get the data of all *.xml files:

file_data_as_list = list(read_files())

If you change the function a little bit, you can store the path as a key in a dict together with the text as value.

def read_files():
    root = Path.cwd() / 'XmlFiles'
    for file in root.glob('*.xml'):
        # yield (key, value)
        yield (file, file.read_text())

xml_content = dict(read_files())

Path.cwd() returns the absolute path, the resulting object during iteration, are also pathlib objects.
The pathlib object itself is not mutable. You can compare it to stings. Changing a path, results in a new path.

Your old version, corrected:

def read_files():
    result = []
    for file in files:
        with open(file, 'r') as f:
            message = f.read()
            result.append(message)
    return result

To get rid of the list inside the function, you can convert it to an generator:

def read_files():
    for file in files:
        with open(file, 'r') as f:
            yield f.read()

The object files should not accessed on global scope.
Use arguments for your functions. In this case the root-directory should be one argument of your function:

def read_files(files):
    for file in files:
        with open(file, 'r') as f:
            yield f.read()

I use generators often to explain things. Often lesser code is needed and it looks like what it does.
If you use a return statement somewhere in your function, you leave the function.

The answer above doesn't really help my situation, from the For Loop, I want to read each element in a loop. I could get 2 files or more, so

names = read_names()
print(next(names))  # returns John
print(next(names))  # returns Tom

Won't really help as they might be 40+ files. I want to iterate a list, for as many elements in the list, then for each element, read the context of it (files in a folder)

As for XML, I use

soup = BeautifulSoup(message, 'lxml')

to clean all the garbage, so there's not an issue here.

(May-06-2019, 07:42 AM)perfringo Wrote: [ -> ]Please read previous answers once again. You have got answer why you get only one name back and how to deal with it.

From lxml FAQ:

Quote:Take a look at the XML specification, it's all about byte sequences and how to map them to text and structure. That leads to rule number one: do not decode your XML data yourself. That's a part of the work of an XML parser, and it does it very well. Just pass it your data as a plain byte stream, it will always do the right thing, by specification.

This also includes not opening XML files in text mode. Make sure you always use binary mode, or, even better, pass the file path into lxml's parse() function to let it do the file opening, reading and closing itself. This is the most simple and most efficient way to do it.

This is what I've done so far:

 path = os.getcwd() + '/XmlFiles/'
files = os.listdir(path)
def Readfiles():
    for file in files:
        # print(file)
        with open(path+file, 'r') as f:
            message = f.read()
        return (message)

message = Readfiles()
soup = BeautifulSoup(message, 'lxml')
print(soup.text.strip())

What this does is, it goes to my folder, get a file, read it and prints it, however when I put the second file in my folder, I only get the results of the first file.I would like to get results for each file in the folder.

Thank you for the reply, it is helping a lot, I'm now close to get what I want, I used this code below:

path = os.getcwd() + '/XmlFiles/'
files = os.listdir(path)
# print(files)

def read_files():
    result = []
    for file in files:
        with open(path+file, 'r') as f:
            message = f.read()
            result.append(message)
    return(result)

message = (read_files())
print(message)

Which I do get two of my files returned in a list,

Output:
['ZZ~<ResponseCode>NoError</ResponseCode><Items><Message xmlns="http://schemas.micros~<?xml version="1.0" encoding="UTF-8"?><GetItemResponse xmlns="

I only copied part of the output, but I do get the whole 2 files, in the list format.

However when I try to apply

Quote:To get rid of the list inside the function, you can convert it to an generator:
def read_files():
    for file in files:
        with open(file, 'r') as f:
            yield f.read()

message = read_files()
print(message)

I'm getting this output

Output:<generator object read_files at 0x00000182510E0A98>

Process finished with exit code 0

I would like to get rid of the list inside the function, to get the results of my 2 files, separately, but not in a list.

For instance if I do this

def Readfiles():
    for file in files:
        print(file)

I'm getting

Output:bi_sentiment_emails1.xml.xml
bi_sentiment_emails2.xml.xml

Process finished with exit code 0

This output is not a list, so I would like to get the same above, the output of the read files.

(May-06-2019, 07:53 AM)DeaD_EyE Wrote: [ -> ]The return statement is wrong.
As first the high-level solution:
from pathlib import Path


def read_files():
    root = Path.cwd() / 'XmlFiles'
    for file in root.glob('*.xml'):
        yield file.read_text()
        # yield file.read_bytes() # to get bytes
This function is a generator and works only, if you iterate over it or use a function/type which iterates implicit over the generator.
To get the data of all *.xml files:
file_data_as_list = list(read_files())
If you change the function a little bit, you can store the path as a key in a dict together with the text as value.
def read_files():
    root = Path.cwd() / 'XmlFiles'
    for file in root.glob('*.xml'):
        # yield (key, value)
        yield (file, file.read_text())

xml_content = dict(read_files())
Path.cwd() returns the absolute path, the resulting object during iteration, are also pathlib objects.
The pathlib object itself is not mutable. You can compare it to stings. Changing a path, results in a new path.

Your old version, corrected:
def read_files():
    result = []
    for file in files:
        with open(file, 'r') as f:
            message = f.read()
            result.append(message)
    return result
To get rid of the list inside the function, you can convert it to an generator:
def read_files():
    for file in files:
        with open(file, 'r') as f:
            yield f.read()
The object files should not accessed on global scope.
Use arguments for your functions. In this case the root-directory should be one argument of your function:
def read_files(files):
    for file in files:
        with open(file, 'r') as f:
            yield f.read()
I use generators often to explain things. Often lesser code is needed and it looks like what it does.
If you use a return statement somewhere in your function, you leave the function.