Posts: 164
Threads: 88
Joined: Feb 2021
Sep-02-2022, 02:31 PM
(This post was last modified: Sep-02-2022, 02:31 PM by Led_Zeppelin.)
I need to know what this Python command does. I will state it now
indexData-org=pd.Dataframe(columns = ['Index', 'sensor_01', 'sensor_02', 'sensor_03', 'sensor_04', 'sensor_06', 'sensor_10', 'sensor_11', 'sensor_12', 'sensor_38', 'sensor_40', 'machine_status']) I am attaching a screenshot of this python code. Each sensor and the machine_status, contains 220320 datapoints. I am assuming that this is a convenient way of moving say, sensor_01 (or any sensor or machine_status) and all of its associated datapoints around in the Python program. I have just never seen this before, hence the post.
Any help appreciated.
Respectfully,
LZ
Attached Files
Thumbnail(s)
Posts: 6,788
Threads: 20
Joined: Feb 2020
Sep-02-2022, 03:16 PM
(This post was last modified: Sep-02-2022, 03:16 PM by deanhystad.)
indexData_org, not indexData-org. Glad you attached the thumbnail
import pandas as pd
x = pd.DataFrame(columns=("A", "B", "C"))
print(x) Output: Empty DataFrame
Columns: [A, B, C]
Index: []
It creates an empty dataframe where the columns are defined, but there are no rows.
Or were you wondering about "indexData_org["Index"] = range(indexData.shape[0])"?
That creates a bunch of rows in indexData_org, equal to the number of rows in indexData. The values are all NaN.
I do not see any advantage to creating a dataframe this way
Posts: 164
Threads: 88
Joined: Feb 2021
Sep-02-2022, 03:59 PM
(This post was last modified: Sep-02-2022, 03:59 PM by Led_Zeppelin.)
Okay, I have attached the critical three pages that I believe will explain my situation better.
The indexdata which I assume is derived from the formula in my initial post is now being used
as the input to say the Dickey-Fuller statistical test. This is obvious on the attached 3 pages. It is also used
elsewhere on the three pages that I attached.
The indexdata formula in my initial post is on the last of the three pages at the top.
Now I am hoping that is the case. It sure looks apt to use as input to the Dickey-Fuller test.
This would make it easier run these tests. I am familiar with these tests, but not with the Python implementation.
If indexData can be used in that way, it would sure make life easier for me.
The three sheets are attached.
Respectfully,
LZ
Posts: 6,788
Threads: 20
Joined: Feb 2020
Sep-02-2022, 05:01 PM
(This post was last modified: Sep-03-2022, 03:45 AM by deanhystad.)
To make a new dataframe that has selected columns or reordered columns from an existing dataframe
image_data_org = imageData['Index', 'sensor_01', 'machine_status'] where "Index", "sensor_01" and "machine_status" are the only columsn in the new dataframe image_data_org
To "shift" 10 rows.
image_data_org = image_data_org[10:].reset_index(drop=True) reset_index(drop=True) resets the row indices to start at 0.
And to compute a rolling window average of selected columns in an existing dataframe
indexData_avg = indeData['sensor_01', 'sensor_40'].rolling(10).mean()[9:].reset_index(drop=True) Need to use [9:] because the first 9 rows of the dataframe are NaN after rolling(10).mean()
If the pandas code you are writing is something you've not seen elsewhere, that usually means there are better ways to what you want to do.
Led_Zeppelin likes this post
Posts: 164
Threads: 88
Joined: Feb 2021
I will try it. Thank you!
Are three any additional libraries/module that I must install and import?
Any help appreciated.
Respectfully,
LZ
Posts: 6,788
Threads: 20
Joined: Feb 2020
Sep-02-2022, 05:35 PM
(This post was last modified: Sep-02-2022, 05:35 PM by deanhystad.)
Install for doing what, the statistical analysis (Dicky-Fuller)?
Posts: 164
Threads: 88
Joined: Feb 2021
Sep-02-2022, 06:52 PM
(This post was last modified: Sep-02-2022, 06:52 PM by Led_Zeppelin.)
(Sep-02-2022, 05:35 PM)deanhystad Wrote: Install for doing what, the statistical analysis (Dicky-Fuller)?
No, Dickey-Fuller is in statsmodels. I will attach a pdf of my output
and show you what I mean.
It has no info on imageData. It is complaining.
Are ImageData and image_data_org reserved words as we used to call them? I believe ImageData must be.
I believe that a library is missing and adding it will correct this error.
Please see attached, and I think my error will be clear.
Respectfully,
LZ
Posts: 6,788
Threads: 20
Joined: Feb 2020
Sep-02-2022, 07:46 PM
(This post was last modified: Sep-03-2022, 03:43 AM by deanhystad.)
No. I used those words because they are in your document. I don't think either has any special meaning in Python. The error message confirms that is true. What makes you think imageData has some special meaning? From your code it looks like imageData is a variable that references a DataFrame. I think imageData should be "df" or a copy of "df".
Please, please, please stop using links in your posts. Copy/paste code unless it is hundreds of lines. Always copy/past error messages.
In you link you have:
df[df['timestamp'].duplicated(keep=False)] This doesn't do anything because you are not keeping the results.
If you want to exclude duplicates you probably want to do something like this:
df = df[~df'timestamp'].duplicated(keep="last")].reset_index(drop=True) This creates a dataframe of bools where the value is True if the row has the same timestamp as another row, EXCEPT for the last occurance. You could also use "first" that does the same thing except for the first occurance. You don't want to use False because that throws all the values away so there are none for a duplcate timestamp.
df'timestamp'].duplicated(keep="last") This inverts the dataframe, so True becomes False and False True.
~df'timestamp'].duplicated(keep="last") This creates a new dataframe that only contains rows that are 1 in ~df'timestamp'].duplicated(keep="last").
df = df[~df'timestamp'].duplicated(keep="last")] Maybe it is easier to see if the steps are broken up in a runnable example.
import pandas as pd
df = pd.DataFrame(
{"timestamp": [0, 1, 2, 2, 3, 4, 5, 5, 5], "value": [0, 1, 2, 3, 4, 5, 6, 7, 8]}
)
duplicates_to_remove = df["timestamp"].duplicated(keep="last")
values_to_keep = ~duplicates_to_remove
df_without_duplicates = df[values_to_keep].reset_index(drop=True)
print("Duplicates to remove", duplicates_to_remove, sep="\n")
print("\nValues to keep", values_to_keep, sep="\n")
print("\nDuplicates removed", df_without_duplicates, sep="\n") Output: Duplicates to remove
0 False
1 False
2 True
3 False
4 False
5 False
6 True
7 True
8 False
Name: timestamp, dtype: bool
Values to keep
0 True
1 True
2 False
3 True
4 True
5 True
6 False
7 False
8 True
Name: timestamp, dtype: bool
Duplicates removed
timestamp value
0 0 0
1 1 1
3 2 3
4 3 4
5 4 5
8 5 8
Posts: 164
Threads: 88
Joined: Feb 2021
(Sep-02-2022, 07:46 PM)deanhystad Wrote: No. I used those words because they are in your document. I don't think either has any special meaning in Python. The error message confirms that is true. What makes you think imageData has some special meaning? From your code it looks like imageData is a variable that references a DataFrame. I think imageData should be "df" or a copy of "df".
Please, please, please stop using links in your posts. Copy/paste code unless it is hundreds of lines. Always copy/past error messages.
In you link you have:
df[df['timestamp'].duplicated(keep=False)] This doesn't do anything because you are not keeping the results.
If you want to exclude duplicates you probably want to do something like this:
df = df[~df'timestamp'].duplicated(keep="last")].reset_index() This creates a dataframe of bools where the value is True if the row has the same timestamp as another row, EXCEPT for the last occurance. You could also use "first" that does the same thing except for the first occurance. You don't want to use False because that throws all the values away so there are none for a duplcate timestamp.
df'timestamp'].duplicated(keep="last") This inverts the dataframe, so True becomes False and False True.
~df'timestamp'].duplicated(keep="last") This creates a new dataframe that only contains rows that are 1 in ~df'timestamp'].duplicated(keep="last").
df = df[~df'timestamp'].duplicated(keep="last")] Maybe it is easier to see if the steps are broken up in a runnable example.
import pandas as pd
df = pd.DataFrame(
{"timestamp": [0, 1, 2, 2, 3, 4, 5, 5, 5], "value": [0, 1, 2, 3, 4, 5, 6, 7, 8]}
)
duplicates_to_remove = df["timestamp"].duplicated(keep="last")
values_to_keep = ~duplicates_to_remove
df_without_duplicates = df[values_to_keep].reset_index()
print("Duplicates to remove", duplicates_to_remove, sep="\n")
print("\nValues to keep", values_to_keep, sep="\n")
print("\nDuplicates removed", df_without_duplicates, sep="\n") Output: Duplicates to remove
0 False
1 False
2 True
3 False
4 False
5 False
6 True
7 True
8 False
Name: timestamp, dtype: bool
Values to keep
0 True
1 True
2 False
3 True
4 True
5 True
6 False
7 False
8 True
Name: timestamp, dtype: bool
Duplicates removed
index timestamp value
0 0 0 0
1 1 1 1
2 3 2 3
3 4 3 4
4 5 4 5
5 8 5 8
tou
Please tell me what you want me to stop doing in your posts.
R,
LZ
Posts: 6,788
Threads: 20
Joined: Feb 2020
Sep-02-2022, 09:08 PM
(This post was last modified: Sep-03-2022, 03:44 AM by deanhystad.)
Please don't reply long posts. Use the reply button, but cut out the unimportant parts of the post. Like this:
(Sep-02-2022, 08:46 PM)Led_Zeppelin Wrote: Please tell me what you want me to stop doing in your posts. Instead of attaching a screenshot of an error message, cut and paste the text in your post.
|