Python Forum

when I try to run the following code, I get an error

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.seasonal import seasonal_decompose

import time
from tqdm import tqdm 
from scipy  import stats  
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

from sklearn.feature_selection import RFE
from sklearn.ensemble import ExtraTreesClassifier

from sklearn.metrics import f1_score
from sklearn.metrics import roc_auc_score
from sklearn.metrics import roc_curve, auc
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report

from xgboost import XGBClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression

df = pd.read_csv("sensor.csv")

print('here')

df.head()

# Find Duplicate Values
# Results will be the list of duplicate values
# If no duplicate values, nothing will list.
df[df['timestamp'].duplicated(keep=False)]

df.isnull().sum()

df['machine_status'].value_counts()

# Convert timestamp column into data type into datetime
df['timestamp'] = pd.to_datetime(df['timestamp'])

# Create a Series
time_period = pd.Series([])

# Assign values to series
for i in tqdm(range(df.shape[0])):
    if (df["timestamp"][i].hour >= 4) and (df[timestamp][i].hour < 10):
        time_period[i]="Morning"  
    elif (df["timestamp"][i].hour >= 10) and (df[timestamp][i].hour < 16):
        time_period[i]="Noon"  
    elif (df["timestamp"][i].hour >= 16) and (df[timestamp][i].hour < 22):   
        time_period[i]="Evening"
    else:    
        time_period[i]="Night"

# Insert new column time period
df.Insert(2, 'time_period', time_period)

I get an error. The error is

Error:C:\Users\james\AppData\Local\Temp\ipykernel_24076\1118037779.py:50: FutureWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
  time_period = pd.Series([])
  0%|                                                                           | 240/220320 [00:00<01:59, 1848.32it/s]
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [1], in <cell line: 53>()
     52 # Assign values to series
     53 for i in tqdm(range(df.shape[0])):
---> 54     if (df["timestamp"][i].hour >= 4) and (df[timestamp][i].hour < 10):
     55         time_period[i]="Morning"  
     56     elif (df["timestamp"][i].hour >= 10) and (df[timestamp][i].hour < 16):

NameError: name 'timestamp' is not defined

Now it says timestamp not defined. I think it is. This is not my code, but somebody else's code.

I am not sure how to correct, I beleive it has something to do with these lines

# Convert timestamp column into data type into datetime
df['timestamp'] = pd.to_datetime(df['timestamp'])

How can I fix it?

Any help appreciated.

Respectfully,

LZ

The error message has an arrow that points directly at the line where the error occurs.

Fix by reading your code carefully everywhere it says "timestamp". Not all you timestamps are the same.

I agree that all timestamps are not the same. When the program runs the following code

# Convert timestamp column into data type into datetime
df['timestamp'] = pd.to_datetime(df['timestamp'])

things change. I just do not see how they change. It seems it is converting a scalar in timestamp to ...what?
That is the cause of this error but removing this statement will not fix the error. You just get another error; that error is different.

Now the statement

 Assign values to series
for i in tqdm(range(df.shape[0])):
    if (df["timestamp"][i].hour >= 4) and (df[timestamp][i].hour < 10):
        time_period[i]="Morning"

Will fail when because timestamp is changed. But changed to what? If I knew the answer to the first question then I could make an educated guess.

Any help such as a hint appreciated.

Thanks in advance.

Respectfully,

LZ

That is not why the statement failed. The statement failed because df[timestamp] is not the same as df['timestamp'].

Led_Zeppelin

deanhystad

Led_Zeppelin

deanhystad