Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 Read CSV Files with multiple headers into Python DataFrame
it's the same. if you are going to loop over many files, of course you will supply full path for each file in a loop. However I don't think that's a problem - you don't get error that file is missing. you claim that the file does not load properly. Just to check, you may want to print FilePath right before the open statement

I did something very similar recently. If you are trawling through a folder of files, first off you need to assess if the structure of the data is the same, if so, use glob to go through the files. Here is all the libraries I needed to import to achieve this and then below is the start of the for loop you need.

import glob
import math
import csv
import pandas as pd
import numpy as np

# set path  to a variable. this is for a windows machine, 
# there  are two backslashes, as a single backslash is an escape key
# so the backslash is needed to show the backslash. Linux and mac
# use forward slashes so only one is used.
path = 'A:\\path_to_folder\\**\\*.csv'

# for loop using glob.
for data_path in glob.glob(path, recursive=True):
    # This line will read the data into a pandas dataframe of which the parameters were:
    # a csv file
    # These files were separated with tabs not commas
    # had to skip first 115 lines and add my own headings as the files were inconsistent
    # as to where the real data started.

    df = pd.read_table(csv, delim_whitespace=True, skiprows=115)
    # It's up to you now what you do with that data.

This will go through each file in the given directory and will also go through all sub directories in that folder looking for any file with a .csv extension. I would google glob and to find out how to get it to work right for you, though I think this is fairly standard out of the box. Ulitmately, this was all that was required to get the data into a usable form, it's certainly a starting point.

Good luck.
@ All , 
     Thanks for all the help so far. I tested the snippets that have been provided to me and they all work on normal files. So, the approach is correct . The issue seems to be with my file . I will have to figure out the encoding and decoding format that might be throwing it off. 


Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  Append Multiple CSV files Nidhesh 2 391 Jul-03-2019, 11:55 AM
Last Post: Nidhesh
  How to extract different data groups from multiple CSV files using python Rafiz 3 460 Jun-04-2019, 05:20 PM
Last Post: jefsummers
  Sum product multiple Dataframes based on column headers. Lastwizzle 0 676 May-21-2019, 04:05 PM
Last Post: Lastwizzle
  Grab columns from multiple files, combine into one jon0852 0 410 Feb-12-2019, 02:53 AM
Last Post: jon0852
  Python QtableWidget get text of all cells and headers to dataframe Mady 3 4,360 Dec-15-2018, 06:46 PM
Last Post: Axel_Erfurt
  Concatenate multiple csv files Oscarca 1 809 Nov-05-2018, 11:18 AM
Last Post: Larz60+
  Python read Password protected excel and convert to Pandas DataFrame FORTITUDE 2 5,208 Aug-30-2018, 01:08 PM
  Auto-headers disable in importing CSV files zealjeung 3 817 Jul-08-2018, 12:41 PM
Last Post: volcano63
  Add column headers to dataframe chisox721 5 3,363 May-20-2018, 12:31 AM
Last Post: volcano63
  Newbie question: how to generate dataframe and use multiple regression zydjohn 0 694 Dec-10-2017, 09:49 AM
Last Post: zydjohn

Forum Jump:

Users browsing this thread: 1 Guest(s)