![]() |
Coding Eror - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: Coding Eror (/thread-38217.html) |
Coding Eror - Led_Zeppelin - Sep-17-2022 The following Python code generates an error: q = image_data_org.loc[image_data_org.loc['machine_status']==0]['time_period'] p = image_data_org.loc[image_data_org.loc['machine_status']==1]['time_period'][:q.shape[0]] pq = np.sum(p * np.log(p/q)) qp = np.sum(q * np.log(q/p)) print('KL(P || Q) : %. pq)%.3f' % pq) print('KL(Q || P) : %. pq)%.3f' % qp)The error is:
RE: Coding Eror - deanhystad - Sep-17-2022 You are not using loc correctly. In the very good pandas documentation https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html They have a nice example that shows how to do what you want to do. Select items from a specified column using a condition on another column. Quote:Conditional that returns a boolean Series with column labels specifiedFor a test I made a little program to see how it worked import numpy as np import pandas as pd df = pd.DataFrame({"machine_status":[0,1,0,1,0,1,0,1], "time_period":[1,2,3,4,5,6,7,8]}) q = df.loc[df['machine_status']==0, ["time_period"]] print(q) As desired this returns a dataframe contain the time_period column rows where machine_status == 0.When trying to solve the problem did you try to write a small program like this? If not, why? If you did write a small program, why didn't you post that? To test your code, I had to make a dataframe. It was not a big effort, but I don't know what the values should be for machine_status or time_period, so my results may not reflect yours. It would be nice if you had supplied a dataframe to save others the effort. I tested the rest of your code snippet and you also have an error in the prints. p and q are not numpy arrays, they are dataframes/series. Numpy knows how to work with pandas, so it can do log and sum, but the result is still a dataframe, not a numpy array or a float. Use to_numpy() to convert a dataframe/series to a numpy array. q = df.loc[df['machine_status']==0, ["time_period"]].to_numpy() p = df.loc[df['machine_status']==1, ['time_period']].to_numpy()[:len(q)]You should also start using f'strings to format your print output. String modulo formatting (%) is really getting old, and there are reasons why it is not used anymore. RE: Coding Eror - Led_Zeppelin - Sep-18-2022 My first question is why you to add second dataframe unless it was just for exposition purposes. My second question is why does the first python code in this form execute and the second one does not. The code that does not work is in an earlier section of this post. I will now list the first code that does work. q = df.loc[df2['machine_status']==0]['sensor_06'] p = df.loc[df2['machine_status']==1]['sensor_06'][:q.shape[0]]Notice the similarity of the two code sets. One works with two dataframes and the other works with one dataframe. They are almost the same, which is I why I applied the same python code (with minor modifications) in the second instance. It just is a mystery to me why the first set works and the second set does not. My book "Numerical Recipes in Statistics" gave me the code for the first instance so I thought I would apply it to the second. But that did not work! There must have been a time when the second code set did work, and it is now out of date. I am just interested in your opinion. This one has really stumped me. Help appreciated. Respectfully, LZ RE: Coding Eror - deanhystad - Sep-19-2022 Quote:My first question is why you to add second dataframe unless it was just for exposition purposes.Could you explain further please. I do not understand this question. Quote:It just is a mystery to me why the first set works and the second set does not.You are not seeing the problem. The problem has nothing to do with one dataframe or two dataframes. The problem is that you use loc where you shouldn't. To make this easer to see, I will show your two examples right next to each other. In this example I will use two dataframes; A and B q = A.loc[B['machine_status']==0]['sensor_06'] # No error q = A.loc[B.loc['machine_status']==0]['sensor_06'] # ErrorAnd again with only one dataframe; A. q = A.loc[A['machine_status']==0]['sensor_06'] # No error q = A.loc[A.loc['machine_status']==0]['sensor_06'] # ErrorThere may be errors in the book, but I think it more likely that your problems are caused by you not seeing the code. It is very common to look at code and see what you expect to see. You will look at the example in the book and you will look at what your typed, and they will look identical because that is what you expect. It happens to all programmers all the time. |