Sep-17-2022, 04:23 PM
(This post was last modified: Sep-17-2022, 05:17 PM by deanhystad.)
You are not using loc correctly.
In the very good pandas documentation
https://pandas.pydata.org/pandas-docs/st...e.loc.html
They have a nice example that shows how to do what you want to do. Select items from a specified column using a condition on another column.
When trying to solve the problem did you try to write a small program like this? If not, why? If you did write a small program, why didn't you post that? To test your code, I had to make a dataframe. It was not a big effort, but I don't know what the values should be for machine_status or time_period, so my results may not reflect yours. It would be nice if you had supplied a dataframe to save others the effort.
I tested the rest of your code snippet and you also have an error in the prints. p and q are not numpy arrays, they are dataframes/series. Numpy knows how to work with pandas, so it can do log and sum, but the result is still a dataframe, not a numpy array or a float. Use to_numpy() to convert a dataframe/series to a numpy array.
In the very good pandas documentation
https://pandas.pydata.org/pandas-docs/st...e.loc.html
They have a nice example that shows how to do what you want to do. Select items from a specified column using a condition on another column.
Quote:Conditional that returns a boolean Series with column labels specifiedFor a test I made a little program to see how it worked
df.loc[df['shield'] > 6, ['max_speed']]
max_speed
sidewinder 7
import numpy as np import pandas as pd df = pd.DataFrame({"machine_status":[0,1,0,1,0,1,0,1], "time_period":[1,2,3,4,5,6,7,8]}) q = df.loc[df['machine_status']==0, ["time_period"]] print(q)
Output: time_period
0 1
2 3
4 5
6 7
As desired this returns a dataframe contain the time_period column rows where machine_status == 0.When trying to solve the problem did you try to write a small program like this? If not, why? If you did write a small program, why didn't you post that? To test your code, I had to make a dataframe. It was not a big effort, but I don't know what the values should be for machine_status or time_period, so my results may not reflect yours. It would be nice if you had supplied a dataframe to save others the effort.
I tested the rest of your code snippet and you also have an error in the prints. p and q are not numpy arrays, they are dataframes/series. Numpy knows how to work with pandas, so it can do log and sum, but the result is still a dataframe, not a numpy array or a float. Use to_numpy() to convert a dataframe/series to a numpy array.
q = df.loc[df['machine_status']==0, ["time_period"]].to_numpy() p = df.loc[df['machine_status']==1, ['time_period']].to_numpy()[:len(q)]You should also start using f'strings to format your print output. String modulo formatting (%) is really getting old, and there are reasons why it is not used anymore.