Read CSV Files with multiple headers into Python DataFrame

UGuntupalli · Jan-23-2017, 02:31 PM

I am trying to learn Python and started with this task of trying to import specific csv files in a given folder into a Python Data Type and then further processing the data. I am struggling with the part where the data needs to be imported into Python . I need this to be efficient. I tried a couple of things and attempted a couple of approaches based on suggestions provided in the forums and other web pages - all of them resulting in one problem or other . If any one can help solve this , the help would be greatly appreciated.
Method 1: Read data line by line:

with open(FilePath,"r") as f: 
        for line in f: 
            Data = f.readline()
FileData = [x.strip() for x in Data]

This yields an empty array

Method 2: Read data using csv reader

f = open(FilePath,'rt')
try: 
    reader = csv.reader(f)
    for row in reader:
        print(row)
finally: 
    f.close()

This yields an error - "line contains NULL byte"

Method 3: To fix the above error , I followed a suggestion provided in Stackoverflow leading me to the following output :

f = open(FilePath,'rb')
data = csv.reader((line.replace('\0','') for line in f), delimiter=",")
print(data)

Method 4: Reading data into data frame

DF = pd.read_csv(FilePath, skiprows=3)

This yields the following error - Error tokenizing data. C error: Expected 1 fields in line 13, saw 2

Upon searching further to fix this error - I ended up here . data = pd.read_csv('file1.csv', error_bad_lines=False) This reads the data correctly however it reads one character per row thus making it really hard to further use the data .

**buran** · Jan-23-2017, 03:05 PM

csv file

Output:name,quatity,price
item1,100,10.35
item2,5,110.00
item3,10,22.50

print 'plain vanila reading file\n'
with open('myfile.csv', 'r') as f: 
       file_data = [row.strip().split(',') for row in f][1:] #[1:] will skip the header line
print file_data

print '\n\nusing csv.reader\n'
import csv
with open('myfile.csv', 'r') as f:
   reader = csv.reader(f, delimiter=',')
   file_data = [row for row in reader][1:] # [1:] will skip the header line
print file_data

print '\n\nusing csv.DictReader\n'
with open('myfile.csv', 'r') as f:
   reader = csv.DictReader(f, delimiter=',')
   file_data = [row for row in reader]
print file_data


print '\n\nusing pandas\n'
import pandas as pd
DF = pd.read_csv('myfile.csv') 
print DF

Output:plain vanila reading file

[['item1', '100', '10.35'], ['item2', '5', '110.00'], ['item3', '10', '22.50']]


using csv.reader

[['item1', '100', '10.35'], ['item2', '5', '110.00'], ['item3', '10', '22.50']]


using csv.DictReader

[{'price': '10.35', 'quatity': '100', 'name': 'item1'}, {'price': '110.00', 'quatity': '5', 'name': 'item2'}, {'price': '22.50', 'quatity': '10', 'name': 'item3'}]


using pandas

   name  quatity   price
0  item1      100   10.35
1  item2        5  110.00
2  item3       10   22.50

Note the differences. Also note that depending on what you are going to do and the size of the file it may be better not to read the entire file in the memory.

UGuntupalli · Jan-23-2017, 04:27 PM

@ Buran ,
I have already tried this method , both using the skip rows and without the skip rows flag without any useful result.

Without skip rows flag :

DF = pd.read_csv(FilePath)

Using Skip Rows :

DF = pd.read_csv(FilePath,, skiprows = 3)

In both cases - I receive the following error :
"Error Tokenizing data . C Error : Expected 1 fields in line 13 , saw 2"
Upon further reading on this error , I have seen suggestions to use the following approach :

DF = pd.read_csv(FilePath, error_bad_lines = False)
This yields a row for each character - which is not what is desired

Best
U

**buran** · Jan-23-2017, 04:37 PM

unless you upload sample data file, I don't see how we can help more. I provided 4 working examples for reading csv files. obviously there is something special with your input file.

UGuntupalli · Jan-23-2017, 05:23 PM

I dont see a way to attach a file . Is there a way ? Apologies I am new to the forum

**buran** · Jan-23-2017, 05:43 PM

upload file in the cloud service and provide link

**buran** · Jan-23-2017, 07:38 PM

Also if you click New Reply (i.e. full edit mode) at the bottom of the screen there is interface to add attachments

UGuntupalli · Jan-24-2017, 01:58 AM

@ Buran :

https://dl.dropboxusercontent.com/u/32778128/Test.csv

Please let me know if you can access this .

**buran** · Jan-24-2017, 07:48 AM

my code from previous post works just fine with this sample csv file:

print '\n\nusing pandas with row 3 as column names\n'
import pandas as pd
DF = pd.read_csv('myfile.csv', skiprows=2) 
print DF.head(n=10) # print first 10 rows

Output:using pandas with row 3 as column names

 Timestamp (yyyy-mm-dd HH:MM:SS)    A  Bong    C    D    E      F    G    H  \
0             2015-01-01 00:00:SS  734   734  734  734  734  734.0  734  734   
1             2015-01-01 01:00:SS  816   816  816  816  816  816.0  816  816   
2             2015-01-01 02:00:SS  114   114  114  114  114  114.0  114  114   
3             2015-01-01 03:00:SS  822   822  822  822  822  822.0  822  822   
4             2015-01-01 04:00:SS  569   569  569  569  569  569.0  569  569   
5             2015-01-01 05:00:SS   87    87   87   87   87   87.0   87   87   
6             2015-01-01 06:00:SS  250   250  250  250  250  250.0  250  250   
7             2015-01-01 07:00:SS  492   492  492  492  492  492.0  492  492   
8             2015-01-01 08:00:SS  862   862  862  862  862  862.0  862  862   
9             2015-01-01 09:00:SS  869   869  869  869  869  869.0  869  869   

    I    J    K  
0  734  734  734  
1  816  816  816  
2  114  114  114  
3  822  822  822  
4  569  569  569  
5   87   87   87  
6  250  250  250  
7  492  492  492  
8  862  862  862  
9  869  869  869

also without column names:

print '\n\nusing pandas without column names\n'
import pandas as pd
DF = pd.read_csv('myfile.csv', skiprows=3, header=None) 
print DF.head(n=10) # print first 10 rows

Output:using pandas without column names

                    0    1    2    3    4    5      6    7    8    9    10  \
0  2015-01-01 00:00:SS  734  734  734  734  734  734.0  734  734  734  734   
1  2015-01-01 01:00:SS  816  816  816  816  816  816.0  816  816  816  816   
2  2015-01-01 02:00:SS  114  114  114  114  114  114.0  114  114  114  114   
3  2015-01-01 03:00:SS  822  822  822  822  822  822.0  822  822  822  822   
4  2015-01-01 04:00:SS  569  569  569  569  569  569.0  569  569  569  569   
5  2015-01-01 05:00:SS   87   87   87   87   87   87.0   87   87   87   87   
6  2015-01-01 06:00:SS  250  250  250  250  250  250.0  250  250  250  250   
7  2015-01-01 07:00:SS  492  492  492  492  492  492.0  492  492  492  492   
8  2015-01-01 08:00:SS  862  862  862  862  862  862.0  862  862  862  862   
9  2015-01-01 09:00:SS  869  869  869  869  869  869.0  869  869  869  869   

    11  
0  734  
1  816  
2  114  
3  822  
4  569  
5   87  
6  250  
7  492  
8  862  
9  869

UGuntupalli · (This post was last modified: Jan-24-2017, 02:33 PM by UGuntupalli.)

@ Buran,
I got one question - that is the only thing I am doing differently . How is it that all sample snippets show the file input as :

DF = pd.read_csv('myfile.csv' , skiprows = 2)

I pass the path string of the file like so :

DF = pd.read_csv(FilePath,skiprows = 2)

If you were looping through a folder , how would you know the name of each file to pass as an input ? Is this a preferred way to do it over passing the path ? Also , I am using Python 3.5

Best
U

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Copying Data from Multiple XLS Files to a Cumulative Master File	Martijn	3	1,367	Apr-05-2025, 05:07 AM Last Post: Pedroski55
	Creating a Dataframe from Zenodo zip file with multiple CSVs about Spotify	man0s	0	1,909	Apr-26-2022, 01:45 PM Last Post: man0s
	Import multiple CSV files into pandas	Krayna	0	2,354	May-20-2021, 04:56 PM Last Post: Krayna
	Loading multiple JSON files to create a csv	0LI5A3A	0	2,874	Jun-28-2020, 10:35 PM Last Post: 0LI5A3A
	Can python read Marathi text files and summarize them?	mcp111	0	2,447	Mar-18-2020, 08:58 AM Last Post: mcp111
	Dividing a single column of dataframe into multiple columns based on char length	darpInd	2	3,377	Mar-14-2020, 09:19 AM Last Post: scidam
	Append Multiple CSV files	Nidhesh	2	3,356	Jul-03-2019, 11:55 AM Last Post: Nidhesh
	How to extract different data groups from multiple CSV files using python	Rafiz	3	4,456	Jun-04-2019, 05:20 PM Last Post: jefsummers
	Sum product multiple Dataframes based on column headers.	Lastwizzle	0	4,691	May-21-2019, 04:05 PM Last Post: Lastwizzle
	Grab columns from multiple files, combine into one	jon0852	0	2,515	Feb-12-2019, 02:53 AM Last Post: jon0852

Read CSV Files with multiple headers into Python DataFrame

User Panel Messages

Announcements