Aim: I would like to convert data, read with read_csv and convert it to a dataframe.
What I've tried: 1. data = pd.read_csv(...) 2. pd.DataFrame(data)
Problem: The columns are not shown in the dataframe as expected in 2 columns.
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
# read data, date_parser=[0]: first column to datetime,
data = pd.read_csv('minimal_data.csv', delimiter = ';', date_parser=[0], usecols=[0, 1], header = 0, names = ["MyColumn1","MyColumn2"]),
print(data)
df = pd.DataFrame(data)
print(df)
---
MacOS 10.15.7, Jupyter notebook
Hello,
I.m not an expert with pandas, however:
- data is already a dataframe.
- drop lines 10 - 13 and all will be fine.
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
# read data, date_parser=[0]: first column to datetime,
data = pd.read_csv('minimal_data.csv', delimiter = ';', date_parser=[0], usecols=[0, 1], header = 0, names = ["MyColumn1","MyColumn2"]),
print(data)
Output:
( MyColumn1 MyColumn2
0 09.06.2021 14:35:05 100
1 09.06.2021 14:36:16 100
2 09.06.2021 14:37:26 100
3 09.06.2021 14:38:37 100
4 09.06.2021 14:39:48 100
5 09.06.2021 14:40:59 100
6 09.06.2021 14:42:10 100
7 09.06.2021 14:43:21 100
8 09.06.2021 14:44:32 100,)
Thank you. I actually need to further evaluate the data. And if I try it directly with data, I get:
data.dtypes
AttributeError: 'tuple' object has no attribute 'dtypes'
or
data.loc[data['MyColumn2'] == 0]
AttributeError: 'tuple' object has no attribute 'iloc'
(Jun-15-2021, 01:26 PM)ju21878436312 Wrote: [ -> ]hank you. I actually need to further evaluate the data. And if I try it directly with data, I get:
You need to get thee DataFrame out of tuple.
Here a example with some advice.
import pandas as pd
import numpy as np
# Pandas has own datateime do not need to use this
#from datetime import datetime, timedelta
# read data, date_parser=[0]: first column to datetime,
data = pd.read_csv('minimal_data.csv', delimiter = ';', date_parser=[0], usecols=[0, 1], header=0, names=["MyColumn1","MyColumn2"]),
# Get DataFrame out of tupe
df = data[0]
# Convert to datetime64
df['MyColumn1'] = pd.to_datetime(df['MyColumn1'])
print(df.dtypes)
print(df)
print('-' * 30)
print(df.loc[df['MyColumn2'] == 0])
Output:
MyColumn1 datetime64[ns]
MyColumn2 int64
dtype: object
MyColumn1 MyColumn2
0 2021-09-06 14:35:05 178
1 2021-09-06 14:36:16 59
2 2021-09-06 14:37:26 0
3 2021-09-06 14:38:37 0
4 2021-09-06 14:39:48 0
5 2021-09-06 14:40:59 0
6 2021-09-06 14:42:10 0
7 2021-09-06 14:43:21 0
8 2021-09-06 14:44:32 0
------------------------------
MyColumn1 MyColumn2
2 2021-09-06 14:37:26 0
3 2021-09-06 14:38:37 0
4 2021-09-06 14:39:48 0
5 2021-09-06 14:40:59 0
6 2021-09-06 14:42:10 0
7 2021-09-06 14:43:21 0
8 2021-09-06 14:44:32 0
@snippsat: Thank you very much for the useful comments!