Python Forum

Full Version: How to store columns of a .csv in variables (when the .csv has no headers) ?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello! I have a .csv file and by using the following code:

my_dataset = pandas_package.read_csv("MyMeasurements.csv")
my_dataset["date"] = pandas_package.to_datetime(my_dataset["DATE"])

TimeStamp = my_dataset["date"]
sensor1_measurements = my_dataset["sensor1"]
sensor2_measurements = my_dataset["sensor2"]
And I get correctly each column of the .csv assigned to each variable. The problem is the following:

How can I get the same result if the .csv does not have headers ('TimeStamp ', 'sensor1_measurements ', 'sensor2_measurements ')?

I tried google searching and testing code but nothing worked... Any ideas???
Start by reading the documentation for Pandas.read_csv(). I find it very useful.

https://pandas.pydata.org/docs/reference...d_csv.html

Where you find this:
Quote:names: array-like, optional
List of column names to use. If the file contains a header row, then you should explicitly pass header=0 to override the column names. Duplicates in this list are not allowed.

This assumes that you know the column names and the order of columns in the CSV.
So, I cannot achieve what I want, right?
That depends on what you want. You can read a CSV file without a header, but your column index will be numbers, 0, 1, 2... just as you get for the row index. To read a CSV file that has no header, again we return to the excellent Pandas documentation.

https://pandas.pydata.org/docs/reference...d_csv.html

This time it is not quite as excellent as before.

Quote:header: int, list of int, None, default ‘infer’
Row number(s) to use as the column names, and the start of the data. Default behavior is to infer the column names: if no names are passed the behavior is identical to header=0 and column names are inferred from the first line of the file, if column names are passed explicitly then the behavior is identical to header=None. Explicitly pass header=0 to be able to replace existing names. The header can be a list of integers that specify row locations for a multi-index on the columns e.g. [0,1,3]. Intervening rows that are not specified will be skipped (e.g. 2 in this example is skipped). Note that this parameter ignores commented lines and empty lines if skip_blank_lines=True, so header=0 denotes the first line of data rather than the first line of the file.

Our choices are:
int : Use this line in the file as the column header.
list of int: Use these lines in the file as the column header (for multi-index headers).
None: Not described very well what happens, but sounds like it will use "names" provided. What does it do if we don't provide names?
default: Infer column header names from the file.

So if you have a CSV file with no header, and you want Pandas to generate column numbers instead of inferring names from the file, set "header = None".
df = pd.read_csv(filename, header=None)
Ok, I have read it! Thank you... I am experimenting now... However, what do I put here:

TimeStamp = ???
sensor1_measurements = ???
sensor2_measurements = ???
Thanks once again for your time...
(Aug-17-2023, 08:52 PM)hobbyist Wrote: [ -> ]And I get correctly each column of the .csv assigned to each variable. The problem is the following:
I think there are some confusion here,you shall not assigned to each variable in Pandas,when read from .csv it makes one DataFrame.
So your line 4-6 is not needed.
If i guess how your MyMeasurements.csv looks.
Output:
DATE,sensor1,sensor2 2023-08-01 00:00:00,25.6,60.2 2023-08-01 01:00:00,25.7,61.5 2023-08-01 02:00:00,25.9,62.8 2023-08-01 03:00:00,26.2,63.2 2023-08-01 04:00:00,26.5,63.8
import pandas as pd

df = pd.read_csv('data.csv')
df["DATE"] = pd.to_datetime(df["DATE"])
This in now a working DataFrame,no need to assigned variables.
Line 4 is not assign a variable,it convert DATE to pandas datetime64 type.
Look at and do something with DataFrame.
>>> df
                 DATE  sensor1  sensor2
0 2023-08-01 00:00:00     25.6     60.2
1 2023-08-01 01:00:00     25.7     61.5
2 2023-08-01 02:00:00     25.9     62.8
3 2023-08-01 03:00:00     26.2     63.2
4 2023-08-01 04:00:00     26.5     63.8

# Look Types they are now correct
>>> df.dtypes
DATE       datetime64[ns]
sensor1           float64
sensor2           float64
dtype: object

# Or can use df.info()
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype         
---  ------   --------------  -----         
 0   DATE     5 non-null      datetime64[ns]
 1   sensor1  5 non-null      float64       
 2   sensor2  5 non-null      float64       
dtypes: datetime64[ns](1), float64(2)
memory usage: 248.0 bytes

# Max value of both sensors 
>>> max_values = df[['sensor1', 'sensor2']].max()
>>> max_values
sensor1    26.5
sensor2    63.8
dtype: float64

# To csv
>>> df.to_csv(index=False, line_terminator='\n')
('DATE,sensor1,sensor2\n'
 '2023-08-01 00:00:00,25.6,60.2\n'
 '2023-08-01 01:00:00,25.7,61.5\n'
 '2023-08-01 02:00:00,25.9,62.8\n'
 '2023-08-01 03:00:00,26.2,63.2\n'
 '2023-08-01 04:00:00,26.5,63.8\n')
(Aug-18-2023, 09:41 AM)hobbyist Wrote: [ -> ]Ok, I have read it! Thank you... I am experimenting now... However, what do I put here:

TimeStamp = ???
sensor1_measurements = ???
sensor2_measurements = ???
Thanks once again for your time...
As I said in my first reply, if you know what columns are in the file you give them names:
df = pd.read_csv(filename, names=["date", "sensor1", "sensor2"])
If you don't know what the columns are the file is worthless, isn't it?