Python Forum
How to use the excel filename as a value to populate new column, using Pandas? - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: How to use the excel filename as a value to populate new column, using Pandas? (/thread-19374.html)



How to use the excel filename as a value to populate new column, using Pandas? - Dequanharrison - Jun-25-2019

I'm new to Python. I have about 50 excel workbooks that all have different names e.g. 012018 CBA, 022018 CBA, etc. For each, I would like to insert a new column called "Date" and use the first 6 digits "012018" of the excel filename as the value to populate the entire Date column. I would like to accomplish this using Pandas. Thank you!


RE: How to use the excel filename as a value to populate new column, using Pandas? - scidam - Jun-26-2019

If your data frame is defined and prefilled, you can create a new column and fill it as follows df['Date'] = 'desired value'. This automatically creates Date column in the data frame and fills it with desired value.

If you want to incorporate data from all excel files into one data frame, your algorithm might be the following:

  1. Define an empty list, e.g., named acc (it will be used later).
  2. Use os.path.walk (os.walk in Py3) or glob.glob to iterate over all files;
  3. Load data from each file using pandas, e.g. pandas.read_csv, pandas.read_excel;
  4. Once you iterating files, you know their names; Let filename is the current filename of a file being loaded into df; You can just do df['Dates'] = filename;
  5. Append each df to acc list;
  6. Use pd.concat to combine all dfs stored in acc into a new data frame.



RE: How to use the excel filename as a value to populate new column, using Pandas? - Dequanharrison - Jun-26-2019

Thanks, but maybe I didn't explain it very well. I have 50 Excel files with a naming sequence of 012018 CBA, 022018 CBA, 032018 CBA, 042018 CBA, etc. For each file, I want to use the first 6 digits of the filename to populate the new column within that particular file ... Then do the same for the all the others.

For example, first file called "012018 CBA", I want to insert a new column called "Date" and use the "012018" to populate that column. Next file called "022018 CBA", I want to insert a new column called "Date" and use the "022018" to populate that column. Next file called "032018 CBA", I want to insert a new column called "Date" and use the "032018" to populate that column, etc for all 50 files.

Make sense?


RE: How to use the excel filename as a value to populate new column, using Pandas? - scidam - Jun-26-2019

Your files have regular names, so you can extract desired dates using index slicing, e.g. date = filename[:6].

(Jun-26-2019, 10:32 AM)Dequanharrison Wrote: I want to insert a new column called "Date" and use the "032018" to populate that column, etc for all 50 files.

So, you will get a data frame with at least 50 columns that have the same name Date? In any case, you will likely use either os.walk or glob.glob to get all file names. You can read about these modules in official docs here and here.

This isn't to be a hard problem, so it would be better if you post some code to show your efforts on solving the problem.


RE: How to use the excel filename as a value to populate new column, using Pandas? - Dequanharrison - Jun-26-2019

Not looking to create a datadrame with 50 columns, but rather end up with 50 dataframes/Excel files woth the additional new "date" column filled with the filename of the original file. Saved as a CSV.

Make sense?


RE: How to use the excel filename as a value to populate new column, using Pandas? - Dequanharrison - Jun-26-2019

I'll try to get started with what I know and post some sample code. Thanks.