Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Unknown Python Command
#1
I need to know what this Python command does. I will state it now

indexData-org=pd.Dataframe(columns = ['Index', 'sensor_01', 'sensor_02', 'sensor_03', 'sensor_04', 'sensor_06', 'sensor_10', 'sensor_11', 'sensor_12', 'sensor_38', 'sensor_40', 'machine_status'])
I am attaching a screenshot of this python code. Each sensor and the machine_status, contains 220320 datapoints. I am assuming that this is a convenient way of moving say, sensor_01 (or any sensor or machine_status) and all of its associated datapoints around in the Python program. I have just never seen this before, hence the post.

Any help appreciated.

Respectfully,

LZ

Attached Files

Thumbnail(s)
   
Reply
#2
indexData_org, not indexData-org. Glad you attached the thumbnail

import pandas as pd
x = pd.DataFrame(columns=("A", "B", "C"))
print(x)
Output:
Empty DataFrame Columns: [A, B, C] Index: []
It creates an empty dataframe where the columns are defined, but there are no rows.

Or were you wondering about "indexData_org["Index"] = range(indexData.shape[0])"?

That creates a bunch of rows in indexData_org, equal to the number of rows in indexData. The values are all NaN.

I do not see any advantage to creating a dataframe this way
Reply
#3
Okay, I have attached the critical three pages that I believe will explain my situation better.

The indexdata which I assume is derived from the formula in my initial post is now being used
as the input to say the Dickey-Fuller statistical test. This is obvious on the attached 3 pages. It is also used
elsewhere on the three pages that I attached.

The indexdata formula in my initial post is on the last of the three pages at the top.

Now I am hoping that is the case. It sure looks apt to use as input to the Dickey-Fuller test.

This would make it easier run these tests. I am familiar with these tests, but not with the Python implementation.

If indexData can be used in that way, it would sure make life easier for me.

The three sheets are attached.

Respectfully,

LZ

Attached Files

.pdf   Predict Failure Using Deep Learnig.pdf (Size: 233.87 KB / Downloads: 116)
Reply
#4
To make a new dataframe that has selected columns or reordered columns from an existing dataframe
image_data_org = imageData['Index', 'sensor_01', 'machine_status']
where "Index", "sensor_01" and "machine_status" are the only columsn in the new dataframe image_data_org

To "shift" 10 rows.
image_data_org = image_data_org[10:].reset_index(drop=True)
reset_index(drop=True) resets the row indices to start at 0.

And to compute a rolling window average of selected columns in an existing dataframe
indexData_avg = indeData['sensor_01', 'sensor_40'].rolling(10).mean()[9:].reset_index(drop=True)
Need to use [9:] because the first 9 rows of the dataframe are NaN after rolling(10).mean()

If the pandas code you are writing is something you've not seen elsewhere, that usually means there are better ways to what you want to do.
Led_Zeppelin likes this post
Reply
#5
I will try it. Thank you!

Are three any additional libraries/module that I must install and import?

Any help appreciated.

Respectfully,

LZ
Reply
#6
Install for doing what, the statistical analysis (Dicky-Fuller)?
Reply
#7
(Sep-02-2022, 05:35 PM)deanhystad Wrote: Install for doing what, the statistical analysis (Dicky-Fuller)?



No, Dickey-Fuller is in statsmodels. I will attach a pdf of my output
and show you what I mean.

It has no info on imageData. It is complaining.

Are ImageData and image_data_org reserved words as we used to call them? I believe ImageData must be.

I believe that a library is missing and adding it will correct this error.

Please see attached, and I think my error will be clear.

Respectfully,

LZ

Attached Files

.pdf   new-format-test.pdf (Size: 177.49 KB / Downloads: 118)
Reply
#8
No. I used those words because they are in your document. I don't think either has any special meaning in Python. The error message confirms that is true. What makes you think imageData has some special meaning? From your code it looks like imageData is a variable that references a DataFrame. I think imageData should be "df" or a copy of "df".

Please, please, please stop using links in your posts. Copy/paste code unless it is hundreds of lines. Always copy/past error messages.

In you link you have:
df[df['timestamp'].duplicated(keep=False)]
This doesn't do anything because you are not keeping the results.

If you want to exclude duplicates you probably want to do something like this:
df = df[~df'timestamp'].duplicated(keep="last")].reset_index(drop=True)
This creates a dataframe of bools where the value is True if the row has the same timestamp as another row, EXCEPT for the last occurance. You could also use "first" that does the same thing except for the first occurance. You don't want to use False because that throws all the values away so there are none for a duplcate timestamp.
df'timestamp'].duplicated(keep="last")
This inverts the dataframe, so True becomes False and False True.
~df'timestamp'].duplicated(keep="last")
This creates a new dataframe that only contains rows that are 1 in ~df'timestamp'].duplicated(keep="last").
df = df[~df'timestamp'].duplicated(keep="last")]
Maybe it is easier to see if the steps are broken up in a runnable example.
import pandas as pd

df = pd.DataFrame(
    {"timestamp": [0, 1, 2, 2, 3, 4, 5, 5, 5], "value": [0, 1, 2, 3, 4, 5, 6, 7, 8]}
)
duplicates_to_remove = df["timestamp"].duplicated(keep="last")
values_to_keep = ~duplicates_to_remove
df_without_duplicates = df[values_to_keep].reset_index(drop=True)
print("Duplicates to remove", duplicates_to_remove, sep="\n")
print("\nValues to keep", values_to_keep, sep="\n")
print("\nDuplicates removed", df_without_duplicates, sep="\n")
Output:
Duplicates to remove 0 False 1 False 2 True 3 False 4 False 5 False 6 True 7 True 8 False Name: timestamp, dtype: bool Values to keep 0 True 1 True 2 False 3 True 4 True 5 True 6 False 7 False 8 True Name: timestamp, dtype: bool Duplicates removed timestamp value 0 0 0 1 1 1 3 2 3 4 3 4 5 4 5 8 5 8
Reply
#9
(Sep-02-2022, 07:46 PM)deanhystad Wrote: No. I used those words because they are in your document. I don't think either has any special meaning in Python. The error message confirms that is true. What makes you think imageData has some special meaning? From your code it looks like imageData is a variable that references a DataFrame. I think imageData should be "df" or a copy of "df".

Please, please, please stop using links in your posts. Copy/paste code unless it is hundreds of lines. Always copy/past error messages.

In you link you have:
df[df['timestamp'].duplicated(keep=False)]
This doesn't do anything because you are not keeping the results.

If you want to exclude duplicates you probably want to do something like this:
df = df[~df'timestamp'].duplicated(keep="last")].reset_index()
This creates a dataframe of bools where the value is True if the row has the same timestamp as another row, EXCEPT for the last occurance. You could also use "first" that does the same thing except for the first occurance. You don't want to use False because that throws all the values away so there are none for a duplcate timestamp.
df'timestamp'].duplicated(keep="last")
This inverts the dataframe, so True becomes False and False True.
~df'timestamp'].duplicated(keep="last")
This creates a new dataframe that only contains rows that are 1 in ~df'timestamp'].duplicated(keep="last").
df = df[~df'timestamp'].duplicated(keep="last")]
Maybe it is easier to see if the steps are broken up in a runnable example.
import pandas as pd

df = pd.DataFrame(
    {"timestamp": [0, 1, 2, 2, 3, 4, 5, 5, 5], "value": [0, 1, 2, 3, 4, 5, 6, 7, 8]}
)
duplicates_to_remove = df["timestamp"].duplicated(keep="last")
values_to_keep = ~duplicates_to_remove
df_without_duplicates = df[values_to_keep].reset_index()
print("Duplicates to remove", duplicates_to_remove, sep="\n")
print("\nValues to keep", values_to_keep, sep="\n")
print("\nDuplicates removed", df_without_duplicates, sep="\n")
Output:
Duplicates to remove 0 False 1 False 2 True 3 False 4 False 5 False 6 True 7 True 8 False Name: timestamp, dtype: bool Values to keep 0 True 1 True 2 False 3 True 4 True 5 True 6 False 7 False 8 True Name: timestamp, dtype: bool Duplicates removed index timestamp value 0 0 0 0 1 1 1 1 2 3 2 3 3 4 3 4 4 5 4 5 5 8 5 8
tou

Please tell me what you want me to stop doing in your posts.

R,

LZ
Reply
#10
Please don't reply long posts. Use the reply button, but cut out the unimportant parts of the post. Like this:
(Sep-02-2022, 08:46 PM)Led_Zeppelin Wrote: Please tell me what you want me to stop doing in your posts.
Instead of attaching a screenshot of an error message, cut and paste the text in your post.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  [split] create a virtual keyboard of an unknown foreign language with python fakoly 0 2,465 May-28-2018, 01:34 AM
Last Post: fakoly

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020