The answer above doesn't really help my situation, from the For Loop, I want to read each element in a loop. I could get 2 files or more, so
names = read_names()
print(next(names)) # returns John
print(next(names)) # returns Tom
Won't really help as they might be 40+ files. I want to iterate a list, for as many elements in the list, then for each element, read the context of it (files in a folder)
As for XML, I use
soup = BeautifulSoup(message, 'lxml')
to clean all the garbage, so there's not an issue here.
(May-06-2019, 07:42 AM)perfringo Wrote: [ -> ]Please read previous answers once again. You have got answer why you get only one name back and how to deal with it.
From lxml FAQ:
Quote:Take a look at the XML specification, it's all about byte sequences and how to map them to text and structure. That leads to rule number one: do not decode your XML data yourself. That's a part of the work of an XML parser, and it does it very well. Just pass it your data as a plain byte stream, it will always do the right thing, by specification.
This also includes not opening XML files in text mode. Make sure you always use binary mode, or, even better, pass the file path into lxml's parse() function to let it do the file opening, reading and closing itself. This is the most simple and most efficient way to do it.
This is what I've done so far:
path = os.getcwd() + '/XmlFiles/'
files = os.listdir(path)
def Readfiles():
for file in files:
# print(file)
with open(path+file, 'r') as f:
message = f.read()
return (message)
message = Readfiles()
soup = BeautifulSoup(message, 'lxml')
print(soup.text.strip())
What this does is, it goes to my folder, get a file, read it and prints it, however when I put the second file in my folder, I only get the results of the first file.I would like to get results for each file in the folder.
Thank you for the reply, it is helping a lot, I'm now close to get what I want, I used this code below:
path = os.getcwd() + '/XmlFiles/'
files = os.listdir(path)
# print(files)
def read_files():
result = []
for file in files:
with open(path+file, 'r') as f:
message = f.read()
result.append(message)
return(result)
message = (read_files())
print(message)
Which I do get two of my files returned in a list,
Output:
['ZZ~<ResponseCode>NoError</ResponseCode><Items><Message xmlns="http://schemas.micros~<?xml version="1.0" encoding="UTF-8"?><GetItemResponse xmlns="
I only copied part of the output, but I do get the whole 2 files, in the list format.
However when I try to apply
Quote:To get rid of the list inside the function, you can convert it to an generator:
def read_files():
for file in files:
with open(file, 'r') as f:
yield f.read()
message = read_files()
print(message)
I'm getting this output
Output:
<generator object read_files at 0x00000182510E0A98>
Process finished with exit code 0
I would like to get rid of the list inside the function, to get the results of my 2 files, separately, but not in a list.
For instance if I do this
def Readfiles():
for file in files:
print(file)
I'm getting
Output:
bi_sentiment_emails1.xml.xml
bi_sentiment_emails2.xml.xml
Process finished with exit code 0
This output is not a list, so I would like to get the same above, the output of the read files.
(May-06-2019, 07:53 AM)DeaD_EyE Wrote: [ -> ]The return statement is wrong.
As first the high-level solution:
from pathlib import Path
def read_files():
root = Path.cwd() / 'XmlFiles'
for file in root.glob('*.xml'):
yield file.read_text()
# yield file.read_bytes() # to get bytes
This function is a generator and works only, if you iterate over it or use a function/type which iterates implicit over the generator.
To get the data of all *.xml files:
file_data_as_list = list(read_files())
If you change the function a little bit, you can store the path as a key in a dict together with the text as value.
def read_files():
root = Path.cwd() / 'XmlFiles'
for file in root.glob('*.xml'):
# yield (key, value)
yield (file, file.read_text())
xml_content = dict(read_files())
Path.cwd()
returns the absolute path, the resulting object during iteration, are also pathlib objects.
The pathlib object itself is not mutable. You can compare it to stings. Changing a path, results in a new path.
Your old version, corrected:
def read_files():
result = []
for file in files:
with open(file, 'r') as f:
message = f.read()
result.append(message)
return result
To get rid of the list inside the function, you can convert it to an generator:
def read_files():
for file in files:
with open(file, 'r') as f:
yield f.read()
The object files
should not accessed on global scope.
Use arguments for your functions. In this case the root-directory should be one argument of your function:
def read_files(files):
for file in files:
with open(file, 'r') as f:
yield f.read()
I use generators often to explain things. Often lesser code is needed and it looks like what it does.
If you use a return statement somewhere in your function, you leave the function.