Unknown Python Command

Led_Zeppelin · (This post was last modified: Sep-02-2022, 02:31 PM by Led_Zeppelin.)

I need to know what this Python command does. I will state it now

indexData-org=pd.Dataframe(columns = ['Index', 'sensor_01', 'sensor_02', 'sensor_03', 'sensor_04', 'sensor_06', 'sensor_10', 'sensor_11', 'sensor_12', 'sensor_38', 'sensor_40', 'machine_status'])

I am attaching a screenshot of this python code. Each sensor and the machine_status, contains 220320 datapoints. I am assuming that this is a convenient way of moving say, sensor_01 (or any sensor or machine_status) and all of its associated datapoints around in the Python program. I have just never seen this before, hence the post.

Any help appreciated.

Respectfully,

LZ

**deanhystad** · (This post was last modified: Sep-02-2022, 03:16 PM by deanhystad.)

indexData_org, not indexData-org. Glad you attached the thumbnail

import pandas as pd
x = pd.DataFrame(columns=("A", "B", "C"))
print(x)

Output:Empty DataFrame
Columns: [A, B, C]
Index: []

It creates an empty dataframe where the columns are defined, but there are no rows.

Or were you wondering about "indexData_org["Index"] = range(indexData.shape[0])"?

That creates a bunch of rows in indexData_org, equal to the number of rows in indexData. The values are all NaN.

I do not see any advantage to creating a dataframe this way

Led_Zeppelin · (This post was last modified: Sep-02-2022, 03:59 PM by Led_Zeppelin.)

Okay, I have attached the critical three pages that I believe will explain my situation better.

The indexdata which I assume is derived from the formula in my initial post is now being used
as the input to say the Dickey-Fuller statistical test. This is obvious on the attached 3 pages. It is also used
elsewhere on the three pages that I attached.

The indexdata formula in my initial post is on the last of the three pages at the top.

Now I am hoping that is the case. It sure looks apt to use as input to the Dickey-Fuller test.

This would make it easier run these tests. I am familiar with these tests, but not with the Python implementation.

If indexData can be used in that way, it would sure make life easier for me.

The three sheets are attached.

Respectfully,

LZ

**deanhystad** · (This post was last modified: Sep-03-2022, 03:45 AM by deanhystad.)

To make a new dataframe that has selected columns or reordered columns from an existing dataframe

image_data_org = imageData['Index', 'sensor_01', 'machine_status']

where "Index", "sensor_01" and "machine_status" are the only columsn in the new dataframe image_data_org

To "shift" 10 rows.

image_data_org = image_data_org[10:].reset_index(drop=True)

reset_index(drop=True) resets the row indices to start at 0.

And to compute a rolling window average of selected columns in an existing dataframe

indexData_avg = indeData['sensor_01', 'sensor_40'].rolling(10).mean()[9:].reset_index(drop=True)

Need to use [9:] because the first 9 rows of the dataframe are NaN after rolling(10).mean()

If the pandas code you are writing is something you've not seen elsewhere, that usually means there are better ways to what you want to do.

Led_Zeppelin · Sep-02-2022, 05:32 PM

I will try it. Thank you!

Are three any additional libraries/module that I must install and import?

Any help appreciated.

Respectfully,

LZ

**deanhystad** · (This post was last modified: Sep-02-2022, 05:35 PM by deanhystad.)

Install for doing what, the statistical analysis (Dicky-Fuller)?

Led_Zeppelin · (This post was last modified: Sep-02-2022, 06:52 PM by Led_Zeppelin.)

(Sep-02-2022, 05:35 PM)deanhystad Wrote: Install for doing what, the statistical analysis (Dicky-Fuller)?

No, Dickey-Fuller is in statsmodels. I will attach a pdf of my output
and show you what I mean.

It has no info on imageData. It is complaining.

Are ImageData and image_data_org reserved words as we used to call them? I believe ImageData must be.

I believe that a library is missing and adding it will correct this error.

Please see attached, and I think my error will be clear.

Respectfully,

LZ

**deanhystad** · (This post was last modified: Sep-03-2022, 03:43 AM by deanhystad.)

No. I used those words because they are in your document. I don't think either has any special meaning in Python. The error message confirms that is true. What makes you think imageData has some special meaning? From your code it looks like imageData is a variable that references a DataFrame. I think imageData should be "df" or a copy of "df".

Please, please, please stop using links in your posts. Copy/paste code unless it is hundreds of lines. Always copy/past error messages.

In you link you have:

df[df['timestamp'].duplicated(keep=False)]

This doesn't do anything because you are not keeping the results.

If you want to exclude duplicates you probably want to do something like this:

df = df[~df'timestamp'].duplicated(keep="last")].reset_index(drop=True)

This creates a dataframe of bools where the value is True if the row has the same timestamp as another row, EXCEPT for the last occurance. You could also use "first" that does the same thing except for the first occurance. You don't want to use False because that throws all the values away so there are none for a duplcate timestamp.

df'timestamp'].duplicated(keep="last")

This inverts the dataframe, so True becomes False and False True.

~df'timestamp'].duplicated(keep="last")

This creates a new dataframe that only contains rows that are 1 in ~df'timestamp'].duplicated(keep="last").

df = df[~df'timestamp'].duplicated(keep="last")]

Maybe it is easier to see if the steps are broken up in a runnable example.

import pandas as pd

df = pd.DataFrame(
    {"timestamp": [0, 1, 2, 2, 3, 4, 5, 5, 5], "value": [0, 1, 2, 3, 4, 5, 6, 7, 8]}
)
duplicates_to_remove = df["timestamp"].duplicated(keep="last")
values_to_keep = ~duplicates_to_remove
df_without_duplicates = df[values_to_keep].reset_index(drop=True)
print("Duplicates to remove", duplicates_to_remove, sep="\n")
print("\nValues to keep", values_to_keep, sep="\n")
print("\nDuplicates removed", df_without_duplicates, sep="\n")

Output:Duplicates to remove
0    False
1    False
2     True
3    False
4    False
5    False
6     True
7     True
8    False
Name: timestamp, dtype: bool

Values to keep
0     True
1     True
2    False
3     True
4     True
5     True
6    False
7    False
8     True
Name: timestamp, dtype: bool

Duplicates removed
   timestamp  value
0          0      0
1          1      1
3          2      3
4          3      4
5          4      5
8          5      8

Led_Zeppelin · Sep-02-2022, 08:46 PM

(Sep-02-2022, 07:46 PM)deanhystad Wrote: No. I used those words because they are in your document. I don't think either has any special meaning in Python. The error message confirms that is true. What makes you think imageData has some special meaning? From your code it looks like imageData is a variable that references a DataFrame. I think imageData should be "df" or a copy of "df".

Please, please, please stop using links in your posts. Copy/paste code unless it is hundreds of lines. Always copy/past error messages.

In you link you have:
df[df['timestamp'].duplicated(keep=False)]
This doesn't do anything because you are not keeping the results.

If you want to exclude duplicates you probably want to do something like this:
df = df[~df'timestamp'].duplicated(keep="last")].reset_index()
This creates a dataframe of bools where the value is True if the row has the same timestamp as another row, EXCEPT for the last occurance. You could also use "first" that does the same thing except for the first occurance. You don't want to use False because that throws all the values away so there are none for a duplcate timestamp.
df'timestamp'].duplicated(keep="last")
This inverts the dataframe, so True becomes False and False True.
~df'timestamp'].duplicated(keep="last")
This creates a new dataframe that only contains rows that are 1 in ~df'timestamp'].duplicated(keep="last").
df = df[~df'timestamp'].duplicated(keep="last")]
Maybe it is easier to see if the steps are broken up in a runnable example.
import pandas as pd

df = pd.DataFrame(
    {"timestamp": [0, 1, 2, 2, 3, 4, 5, 5, 5], "value": [0, 1, 2, 3, 4, 5, 6, 7, 8]}
)
duplicates_to_remove = df["timestamp"].duplicated(keep="last")
values_to_keep = ~duplicates_to_remove
df_without_duplicates = df[values_to_keep].reset_index()
print("Duplicates to remove", duplicates_to_remove, sep="\n")
print("\nValues to keep", values_to_keep, sep="\n")
print("\nDuplicates removed", df_without_duplicates, sep="\n")
Output:Duplicates to remove
0    False
1    False
2     True
3    False
4    False
5    False
6     True
7     True
8    False
Name: timestamp, dtype: bool

Values to keep
0     True
1     True
2    False
3     True
4     True
5     True
6    False
7    False
8     True
Name: timestamp, dtype: bool

Duplicates removed
   index  timestamp  value
0      0          0      0
1      1          1      1
2      3          2      3
3      4          3      4
4      5          4      5
5      8          5      8
tou

Please tell me what you want me to stop doing in your posts.

R,

LZ

**deanhystad** · (This post was last modified: Sep-03-2022, 03:44 AM by deanhystad.)

Please don't reply long posts. Use the reply button, but cut out the unimportant parts of the post. Like this:

(Sep-02-2022, 08:46 PM)Led_Zeppelin Wrote: Please tell me what you want me to stop doing in your posts.

Instead of attaching a screenshot of an error message, cut and paste the text in your post.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Unknown Expression	Led_Zeppelin	5	3,230	Oct-15-2022, 12:14 PM Last Post: deanhystad
	Help!Unknown ERROR	bwdu	1	2,660	Apr-20-2020, 02:09 PM Last Post: deanhystad
	Unknown error	TheIDarKIKnight	0	2,037	Apr-19-2020, 05:27 PM Last Post: TheIDarKIKnight
	[split] create a virtual keyboard of an unknown foreign language with python	fakoly	0	2,984	May-28-2018, 01:34 AM Last Post: fakoly
	Unknown output	brzo	3	4,554	Jun-24-2017, 04:16 PM Last Post: Larz60+

Unknown Python Command

User Panel Messages

Announcements