Jan-23-2017, 02:31 PM
I am trying to learn Python and started with this task of trying to import specific csv files in a given folder into a Python Data Type and then further processing the data. I am struggling with the part where the data needs to be imported into Python . I need this to be efficient. I tried a couple of things and attempted a couple of approaches based on suggestions provided in the forums and other web pages - all of them resulting in one problem or other . If any one can help solve this , the help would be greatly appreciated.
Method 1: Read data line by line:
Method 2: Read data using csv reader
Method 3: To fix the above error , I followed a suggestion provided in Stackoverflow leading me to the following output :
DF = pd.read_csv(FilePath, skiprows=3)
This yields the following error - Error tokenizing data. C error: Expected 1 fields in line 13, saw 2
Upon searching further to fix this error - I ended up here . data = pd.read_csv('file1.csv', error_bad_lines=False) This reads the data correctly however it reads one character per row thus making it really hard to further use the data .
Method 1: Read data line by line:
with open(FilePath,"r") as f: for line in f: Data = f.readline() FileData = [x.strip() for x in Data]This yields an empty array
Method 2: Read data using csv reader
f = open(FilePath,'rt') try: reader = csv.reader(f) for row in reader: print(row) finally: f.close()This yields an error - "line contains NULL byte"
Method 3: To fix the above error , I followed a suggestion provided in Stackoverflow leading me to the following output :
f = open(FilePath,'rb') data = csv.reader((line.replace('\0','') for line in f), delimiter=",") print(data)Method 4: Reading data into data frame
DF = pd.read_csv(FilePath, skiprows=3)
This yields the following error - Error tokenizing data. C error: Expected 1 fields in line 13, saw 2
Upon searching further to fix this error - I ended up here . data = pd.read_csv('file1.csv', error_bad_lines=False) This reads the data correctly however it reads one character per row thus making it really hard to further use the data .