Coding Eror

Led_Zeppelin

The following Python code generates an error:

q = image_data_org.loc[image_data_org.loc['machine_status']==0]['time_period']
p = image_data_org.loc[image_data_org.loc['machine_status']==1]['time_period'][:q.shape[0]]

pq = np.sum(p * np.log(p/q))
qp = np.sum(q * np.log(q/p))
print('KL(P || Q) : %. pq)%.3f' % pq)
print('KL(Q || P) : %. pq)%.3f' % qp)

The error is:

Error:---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Input In [12], in <cell line: 1>()
----> 1 q = image_data_org.loc[image_data_org.loc['machine_status']==0]['time_period']
      2 p = image_data_org.loc[image_data_org.loc['machine_status']==1]['time_period'][:q.shape[0]]
      4 pq = np.sum(p * np.log(p/q))

File ~\miniconda3\envs\pump-failure-pred\lib\site-packages\pandas\core\indexing.py:967, in _LocationIndexer.__getitem__(self, key)
    964 axis = self.axis or 0
    966 maybe_callable = com.apply_if_callable(key, self.obj)
--> 967 return self._getitem_axis(maybe_callable, axis=axis)

File ~\miniconda3\envs\pump-failure-pred\lib\site-packages\pandas\core\indexing.py:1202, in _LocIndexer._getitem_axis(self, key, axis)
   1200 # fall thru to straight lookup
   1201 self._validate_key(key, axis)
-> 1202 return self._get_label(key, axis=axis)

File ~\miniconda3\envs\pump-failure-pred\lib\site-packages\pandas\core\indexing.py:1153, in _LocIndexer._get_label(self, label, axis)
   1151 def _get_label(self, label, axis: int):
   1152     # GH#5667 this will fail if the label is not present in the axis.
-> 1153     return self.obj.xs(label, axis=axis)

File ~\miniconda3\envs\pump-failure-pred\lib\site-packages\pandas\core\generic.py:3864, in NDFrame.xs(self, key, axis, level, drop_level)
   3862             new_index = index[loc]
   3863 else:
-> 3864     loc = index.get_loc(key)
   3866     if isinstance(loc, np.ndarray):
   3867         if loc.dtype == np.bool_:

File ~\miniconda3\envs\pump-failure-pred\lib\site-packages\pandas\core\indexes\range.py:389, in RangeIndex.get_loc(self, key, method, tolerance)
    387             raise KeyError(key) from err
    388     self._check_indexing_error(key)
--> 389     raise KeyError(key)
    390 return super().get_loc(key, method=method, tolerance=tolerance)

KeyError: 'machine_status'

**deanhystad** · (This post was last modified: Sep-17-2022, 05:17 PM by deanhystad.)

You are not using loc correctly.

In the very good pandas documentation
https://pandas.pydata.org/pandas-docs/st...e.loc.html

They have a nice example that shows how to do what you want to do. Select items from a specified column using a condition on another column.

Quote:Conditional that returns a boolean Series with column labels specified

df.loc[df['shield'] > 6, ['max_speed']]
max_speed
sidewinder 7

For a test I made a little program to see how it worked

import numpy as np
import pandas as pd

df = pd.DataFrame({"machine_status":[0,1,0,1,0,1,0,1], "time_period":[1,2,3,4,5,6,7,8]})

q = df.loc[df['machine_status']==0, ["time_period"]]
print(q)

Output:   time_period
0            1
2            3
4            5
6            7

As desired this returns a dataframe contain the time_period column rows where machine_status == 0.

When trying to solve the problem did you try to write a small program like this? If not, why? If you did write a small program, why didn't you post that? To test your code, I had to make a dataframe. It was not a big effort, but I don't know what the values should be for machine_status or time_period, so my results may not reflect yours. It would be nice if you had supplied a dataframe to save others the effort.

I tested the rest of your code snippet and you also have an error in the prints. p and q are not numpy arrays, they are dataframes/series. Numpy knows how to work with pandas, so it can do log and sum, but the result is still a dataframe, not a numpy array or a float. Use to_numpy() to convert a dataframe/series to a numpy array.

q = df.loc[df['machine_status']==0, ["time_period"]].to_numpy()
p = df.loc[df['machine_status']==1, ['time_period']].to_numpy()[:len(q)]

You should also start using f'strings to format your print output. String modulo formatting (%) is really getting old, and there are reasons why it is not used anymore.

Led_Zeppelin · (This post was last modified: Sep-18-2022, 10:18 PM by Led_Zeppelin.)

My first question is why you to add second dataframe unless it was just for exposition purposes.

My second question is why does the first python code in this form execute and the second one does not.

The code that does not work is in an earlier section of this post. I will now list the first code that does work.

q = df.loc[df2['machine_status']==0]['sensor_06']
p = df.loc[df2['machine_status']==1]['sensor_06'][:q.shape[0]]

Notice the similarity of the two code sets. One works with two dataframes and the other works with one dataframe.

They are almost the same, which is I why I applied the same python code (with minor modifications) in the second instance.

It just is a mystery to me why the first set works and the second set does not.

My book "Numerical Recipes in Statistics" gave me the code for the first instance so I thought I would apply it to the second. But that did not work!

There must have been a time when the second code set did work, and it is now out of date.

I am just interested in your opinion.

This one has really stumped me.

Help appreciated.

Respectfully,

LZ

**deanhystad** · (This post was last modified: Sep-19-2022, 03:02 AM by deanhystad.)

Quote:My first question is why you to add second dataframe unless it was just for exposition purposes.

Could you explain further please. I do not understand this question.

Quote:It just is a mystery to me why the first set works and the second set does not.

You are not seeing the problem. The problem has nothing to do with one dataframe or two dataframes. The problem is that you use loc where you shouldn't.

To make this easer to see, I will show your two examples right next to each other. In this example I will use two dataframes; A and B

q = A.loc[B['machine_status']==0]['sensor_06']  # No error
q = A.loc[B.loc['machine_status']==0]['sensor_06']  # Error

And again with only one dataframe; A.

q = A.loc[A['machine_status']==0]['sensor_06']  # No error
q = A.loc[A.loc['machine_status']==0]['sensor_06']  # Error

There may be errors in the book, but I think it more likely that your problems are caused by you not seeing the code. It is very common to look at code and see what you expect to see. You will look at the example in the book and you will look at what your typed, and they will look identical because that is what you expect. It happens to all programmers all the time.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	invalid syntax eror	Larry	3	4,966	Feb-18-2018, 05:56 PM Last Post: buran

Coding Eror

User Panel Messages

Announcements