Read CSV Files with multiple headers into Python DataFrame - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: Read CSV Files with multiple headers into Python DataFrame (/thread-1734.html) Pages:
1
2
|
Read CSV Files with multiple headers into Python DataFrame - UGuntupalli - Jan-23-2017 I am trying to learn Python and started with this task of trying to import specific csv files in a given folder into a Python Data Type and then further processing the data. I am struggling with the part where the data needs to be imported into Python . I need this to be efficient. I tried a couple of things and attempted a couple of approaches based on suggestions provided in the forums and other web pages - all of them resulting in one problem or other . If any one can help solve this , the help would be greatly appreciated. Method 1: Read data line by line: with open(FilePath,"r") as f: for line in f: Data = f.readline() FileData = [x.strip() for x in Data]This yields an empty array Method 2: Read data using csv reader f = open(FilePath,'rt') try: reader = csv.reader(f) for row in reader: print(row) finally: f.close()This yields an error - "line contains NULL byte" Method 3: To fix the above error , I followed a suggestion provided in Stackoverflow leading me to the following output : f = open(FilePath,'rb') data = csv.reader((line.replace('\0','') for line in f), delimiter=",") print(data)Method 4: Reading data into data frame DF = pd.read_csv(FilePath, skiprows=3) This yields the following error - Error tokenizing data. C error: Expected 1 fields in line 13, saw 2 Upon searching further to fix this error - I ended up here . data = pd.read_csv('file1.csv', error_bad_lines=False) This reads the data correctly however it reads one character per row thus making it really hard to further use the data . RE: Read CSV Files with multiple headers into Python DataFrame - buran - Jan-23-2017 csv file
print 'plain vanila reading file\n' with open('myfile.csv', 'r') as f: file_data = [row.strip().split(',') for row in f][1:] #[1:] will skip the header line print file_data print '\n\nusing csv.reader\n' import csv with open('myfile.csv', 'r') as f: reader = csv.reader(f, delimiter=',') file_data = [row for row in reader][1:] # [1:] will skip the header line print file_data print '\n\nusing csv.DictReader\n' with open('myfile.csv', 'r') as f: reader = csv.DictReader(f, delimiter=',') file_data = [row for row in reader] print file_data print '\n\nusing pandas\n' import pandas as pd DF = pd.read_csv('myfile.csv') print DF Note the differences. Also note that depending on what you are going to do and the size of the file it may be better not to read the entire file in the memory.
RE: Read CSV Files with multiple headers into Python DataFrame - UGuntupalli - Jan-23-2017 @ Buran , I have already tried this method , both using the skip rows and without the skip rows flag without any useful result. Without skip rows flag : DF = pd.read_csv(FilePath) Using Skip Rows : DF = pd.read_csv(FilePath,, skiprows = 3) In both cases - I receive the following error : "Error Tokenizing data . C Error : Expected 1 fields in line 13 , saw 2" Upon further reading on this error , I have seen suggestions to use the following approach : DF = pd.read_csv(FilePath, error_bad_lines = False) This yields a row for each character - which is not what is desired Best U RE: Read CSV Files with multiple headers into Python DataFrame - buran - Jan-23-2017 unless you upload sample data file, I don't see how we can help more. I provided 4 working examples for reading csv files. obviously there is something special with your input file. RE: Read CSV Files with multiple headers into Python DataFrame - UGuntupalli - Jan-23-2017 I dont see a way to attach a file . Is there a way ? Apologies I am new to the forum RE: Read CSV Files with multiple headers into Python DataFrame - buran - Jan-23-2017 upload file in the cloud service and provide link RE: Read CSV Files with multiple headers into Python DataFrame - buran - Jan-23-2017 Also if you click New Reply (i.e. full edit mode) at the bottom of the screen there is interface to add attachments RE: Read CSV Files with multiple headers into Python DataFrame - UGuntupalli - Jan-24-2017 @ Buran : https://dl.dropboxusercontent.com/u/32778128/Test.csv Please let me know if you can access this . RE: Read CSV Files with multiple headers into Python DataFrame - buran - Jan-24-2017 my code from previous post works just fine with this sample csv file: print '\n\nusing pandas with row 3 as column names\n' import pandas as pd DF = pd.read_csv('myfile.csv', skiprows=2) print DF.head(n=10) # print first 10 rows also without column names:print '\n\nusing pandas without column names\n' import pandas as pd DF = pd.read_csv('myfile.csv', skiprows=3, header=None) print DF.head(n=10) # print first 10 rows
RE: Read CSV Files with multiple headers into Python DataFrame - UGuntupalli - Jan-24-2017 @ Buran, I got one question - that is the only thing I am doing differently . How is it that all sample snippets show the file input as : DF = pd.read_csv('myfile.csv' , skiprows = 2) I pass the path string of the file like so : DF = pd.read_csv(FilePath,skiprows = 2) If you were looping through a folder , how would you know the name of each file to pass as an input ? Is this a preferred way to do it over passing the path ? Also , I am using Python 3.5 Best U |