Python Forum
read multiple .xlsx files and text files in a directory - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: read multiple .xlsx files and text files in a directory (/thread-3554.html)

Pages: 1 2


read multiple .xlsx files and text files in a directory - BNB - Jun-02-2017

I have a list of .xlsx files (names 1.xlsx, 2.xlsx etc), each with several worksheet, in a directory. I need to insert data from its corresponding text file (named 1.txt, 2.txt etc) on the second worksheet named 'Filtered' and save it along with its original contents. The code I am trying is below. I end up with a blank worksheet in 'Filtered'. I am not sure how to read multiple text files with corresponding excel files and paste data. Any suggestions/edits on code ? Thank you

#!/usr/bin/python

import os
from openpyxl.reader.excel import load_workbook
import csv

directoryPath = r'/REPORTS/'
os.chdir(directoryPath)
folder_list = os.listdir(directoryPath)
for folders, sub_folders, file in os.walk(directoryPath):  
   for name in file:
       if name.endswith(".xlsx"):
           filename = os.path.join(folders, name)
           wb = load_workbook(filename, data_only=True)
           ws = wb.get_sheet_by_name('Filtered')
           directory = os.path.join("/REPORTS/temp", "path")
           for root,dirs,files in os.walk(directory):
               for file in files:
                   if file.endswith(".txt"):
                       f = open(textfile, 'r')
                       reader = csv.reader(f, delimiter='\t')
                       for row in reader:
                           ws.append(row)
                           wb.save(filename)



RE: read multiple .xlsx files and text files in a directory - Larz60+ - Jun-02-2017

Question: Are you converting to text (outside of your python program) for the
sole reason of converting them into a format that can be imported into Python?
If so, there are multiple libraries available to read and/or write directly to Excel,
Pandas being one of the most dominant. see: http://pandas.pydata.org/


RE: read multiple .xlsx files and text files in a directory - BNB - Jun-05-2017

Hello,
the text files are a result of some calculation done using awk and I need to paste the results back to the excel files.


RE: read multiple .xlsx files and text files in a directory - Larz60+ - Jun-05-2017

OK, I'm not familiar with openpyxl, but it is capable of reading and writing (according to the docs)
I would still look into pandas.
Assuming (tell me if I am wrong):
several input spreadsheets to be joined together into one output sheet.

so:
  • open output sheet as a writable sheet
  • for each input sheet, read workseheet, write to new sheet
  • when all read, close output
  • should be as simple as that

I would still look at pandas. I am not saying that there is anything wrong with the package
you are using, but I had problems with a few when I needed to manipulate excel.

Pandas is rock solid, and offers many more features


RE: read multiple .xlsx files and text files in a directory - volcano63 - Jun-06-2017

(Jun-02-2017, 11:14 AM)BNB Wrote:
#!/usr/bin/python

import os
from openpyxl.reader.excel import load_workbook
import csv

I am not sure why are you using openpyxl.reader.excel, I just checked a code I've written several week ago
from openpyxl import Workbook, load_workbook
Works just fine.  

wb.get_sheet_by_name causes exception if the sheet with that name does not exist - but I guess that is not your problem.

PS I am not sure why, but it is recommended to open csv files with newline='' argument

Another thing - you are collecting the same data for each Excel file you find. It is sort of inefficient. What exactly are you trying to do?


RE: read multiple .xlsx files and text files in a directory - volcano63 - Jun-06-2017

(Jun-05-2017, 11:58 AM)Larz60+ Wrote: ......
I would still look at pandas. I am not saying that there is anything wrong with the package you are using, but I had problems with a few when I needed to manipulate excel.
At the risk of facing fire and brimstone  Tongue , I would like to point out that pandas  - great package, BTW Thumbs Up - may be intimidating for beginners.

And its functioning against Excel is not without flaws. When I saved a huge dataframe that contained a column of string digits, it (the column) was saved as numbers in Excel. Lucky for me, they were saved as integers - otherwise, my data would have been corrupted (like when it was converted to scientific form Cry ).

pandas will not provide you with formatting options either.

(Jun-05-2017, 11:58 AM)Larz60+ Wrote: Pandas is rock solid, and offers many more features
About "rock solid" - they have been making some changes that are not backward-compatible lately, I believe?! And occasionally you don't need "more features" Wink

(Jun-05-2017, 11:58 AM)Larz60+ Wrote: ......
I would still look at pandas. I am not saying that there is anything wrong with the package you are using, but I had problems with a few when I needed to manipulate excel.
At the risk of facing fire and brimstone  Tongue , I would like to point out that pandas  - great package, BTW Thumbs Up - may be intimidating for beginners.

And its functioning against Excel is not without flaws. When I saved a huge dataframe that contained a column of string digits, it (the column) was saved as numbers in Excel. Lucky for me, they were saved as integers - otherwise, my data would have been corrupted (like when it was converted to scientific form Cry ).

pandas will not provide you with formatting options either.

(Jun-05-2017, 11:58 AM)Larz60+ Wrote: Pandas is rock solid, and offers many more features
About "rock solid" - they have been making some changes that are not backward-compatible lately, I believe?! And occasionally you don't need "more features" Wink


RE: read multiple .xlsx files and text files in a directory - Larz60+ - Jun-06-2017

Oh well, it's your option


RE: read multiple .xlsx files and text files in a directory - volcano63 - Jun-06-2017

(Jun-06-2017, 08:22 PM)Larz60+ Wrote: Oh well, it's your option
When I have nothing to say - I prefer to pledge the fifth. Well, it's not the first time moderators try to shame me.

I think I am starting to understand Thaos Confused .

Let me repeat myself: pandas is not easy, not for beginners - and not always necessary...


RE: read multiple .xlsx files and text files in a directory - nilamo - Jun-06-2017

(Jun-06-2017, 08:28 PM)volcano63 Wrote: Well, it's not the first time moderators try to shame me.
I'm a moderator.  And I...
1) don't recommend pandas, but mostly because...
2) I'm not even really sure what it is Tongue

From what I've seen it's basically numpy, but more complicated and without any additional benefits.


RE: read multiple .xlsx files and text files in a directory - Larz60+ - Jun-06-2017

Not trying to shame, only stating that it's ok to do as you wish.