Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Coding Eror
#1
The following Python code generates an error:

q = image_data_org.loc[image_data_org.loc['machine_status']==0]['time_period']
p = image_data_org.loc[image_data_org.loc['machine_status']==1]['time_period'][:q.shape[0]]

pq = np.sum(p * np.log(p/q))
qp = np.sum(q * np.log(q/p))
print('KL(P || Q) : %. pq)%.3f' % pq)
print('KL(Q || P) : %. pq)%.3f' % qp) 
The error is:

Error:
--------------------------------------------------------------------------- KeyError Traceback (most recent call last) Input In [12], in <cell line: 1>() ----> 1 q = image_data_org.loc[image_data_org.loc['machine_status']==0]['time_period'] 2 p = image_data_org.loc[image_data_org.loc['machine_status']==1]['time_period'][:q.shape[0]] 4 pq = np.sum(p * np.log(p/q)) File ~\miniconda3\envs\pump-failure-pred\lib\site-packages\pandas\core\indexing.py:967, in _LocationIndexer.__getitem__(self, key) 964 axis = self.axis or 0 966 maybe_callable = com.apply_if_callable(key, self.obj) --> 967 return self._getitem_axis(maybe_callable, axis=axis) File ~\miniconda3\envs\pump-failure-pred\lib\site-packages\pandas\core\indexing.py:1202, in _LocIndexer._getitem_axis(self, key, axis) 1200 # fall thru to straight lookup 1201 self._validate_key(key, axis) -> 1202 return self._get_label(key, axis=axis) File ~\miniconda3\envs\pump-failure-pred\lib\site-packages\pandas\core\indexing.py:1153, in _LocIndexer._get_label(self, label, axis) 1151 def _get_label(self, label, axis: int): 1152 # GH#5667 this will fail if the label is not present in the axis. -> 1153 return self.obj.xs(label, axis=axis) File ~\miniconda3\envs\pump-failure-pred\lib\site-packages\pandas\core\generic.py:3864, in NDFrame.xs(self, key, axis, level, drop_level) 3862 new_index = index[loc] 3863 else: -> 3864 loc = index.get_loc(key) 3866 if isinstance(loc, np.ndarray): 3867 if loc.dtype == np.bool_: File ~\miniconda3\envs\pump-failure-pred\lib\site-packages\pandas\core\indexes\range.py:389, in RangeIndex.get_loc(self, key, method, tolerance) 387 raise KeyError(key) from err 388 self._check_indexing_error(key) --> 389 raise KeyError(key) 390 return super().get_loc(key, method=method, tolerance=tolerance) KeyError: 'machine_status'

Attached Files

Thumbnail(s)
   
Reply
#2
You are not using loc correctly.

In the very good pandas documentation
https://pandas.pydata.org/pandas-docs/st...e.loc.html

They have a nice example that shows how to do what you want to do. Select items from a specified column using a condition on another column.

Quote:Conditional that returns a boolean Series with column labels specified

df.loc[df['shield'] > 6, ['max_speed']]
max_speed
sidewinder 7
For a test I made a little program to see how it worked
import numpy as np
import pandas as pd

df = pd.DataFrame({"machine_status":[0,1,0,1,0,1,0,1], "time_period":[1,2,3,4,5,6,7,8]})

q = df.loc[df['machine_status']==0, ["time_period"]]
print(q)
Output:
time_period 0 1 2 3 4 5 6 7
As desired this returns a dataframe contain the time_period column rows where machine_status == 0.

When trying to solve the problem did you try to write a small program like this? If not, why? If you did write a small program, why didn't you post that? To test your code, I had to make a dataframe. It was not a big effort, but I don't know what the values should be for machine_status or time_period, so my results may not reflect yours. It would be nice if you had supplied a dataframe to save others the effort.

I tested the rest of your code snippet and you also have an error in the prints. p and q are not numpy arrays, they are dataframes/series. Numpy knows how to work with pandas, so it can do log and sum, but the result is still a dataframe, not a numpy array or a float. Use to_numpy() to convert a dataframe/series to a numpy array.
q = df.loc[df['machine_status']==0, ["time_period"]].to_numpy()
p = df.loc[df['machine_status']==1, ['time_period']].to_numpy()[:len(q)]
You should also start using f'strings to format your print output. String modulo formatting (%) is really getting old, and there are reasons why it is not used anymore.
Reply
#3
My first question is why you to add second dataframe unless it was just for exposition purposes.

My second question is why does the first python code in this form execute and the second one does not.

The code that does not work is in an earlier section of this post. I will now list the first code that does work.

q = df.loc[df2['machine_status']==0]['sensor_06']
p = df.loc[df2['machine_status']==1]['sensor_06'][:q.shape[0]]
Notice the similarity of the two code sets. One works with two dataframes and the other works with one dataframe.

They are almost the same, which is I why I applied the same python code (with minor modifications) in the second instance.

It just is a mystery to me why the first set works and the second set does not.

My book "Numerical Recipes in Statistics" gave me the code for the first instance so I thought I would apply it to the second. But that did not work!

There must have been a time when the second code set did work, and it is now out of date.

I am just interested in your opinion.

This one has really stumped me.

Help appreciated.

Respectfully,

LZ
Reply
#4
Quote:My first question is why you to add second dataframe unless it was just for exposition purposes.
Could you explain further please. I do not understand this question.

Quote:It just is a mystery to me why the first set works and the second set does not.
You are not seeing the problem. The problem has nothing to do with one dataframe or two dataframes. The problem is that you use loc where you shouldn't.

To make this easer to see, I will show your two examples right next to each other. In this example I will use two dataframes; A and B
q = A.loc[B['machine_status']==0]['sensor_06']  # No error
q = A.loc[B.loc['machine_status']==0]['sensor_06']  # Error
And again with only one dataframe; A.
q = A.loc[A['machine_status']==0]['sensor_06']  # No error
q = A.loc[A.loc['machine_status']==0]['sensor_06']  # Error
There may be errors in the book, but I think it more likely that your problems are caused by you not seeing the code. It is very common to look at code and see what you expect to see. You will look at the example in the book and you will look at what your typed, and they will look identical because that is what you expect. It happens to all programmers all the time.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  invalid syntax eror Larry 3 4,041 Feb-18-2018, 05:56 PM
Last Post: buran

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020