Posts: 164
Threads: 88
Joined: Feb 2021
Jun-30-2022, 01:11 PM
(This post was last modified: Jun-30-2022, 01:24 PM by Led_Zeppelin.)
I have spent a day is trying to get the following code to work.
df2 = pd.DataFrame(df, columns = ['sensor_00', 'sensor_01', 'sensor_02', 'sensor_03', 'sensor_04', 'sensor_05', 'sensor_06', \
'sensor_07', 'sensor_08', 'sensor_09', 'sensor_10', 'sensor_11', 'sensor_12', 'sensor_13', 'sensor_14', 'sensor_15', \
'sensor_16', 'sensor_17', 'sensor_18', 'sensor_19', 'sensor_20', 'sensor_21', 'sensor_22', 'sensor_23', 'sensor_24', \
'sensor_25', 'sensor_26', 'sensor_27', 'sensor_28', 'sensor_29', 'sensor_30', 'sensor_31', 'sensor_32', 'sensor_33', \
'sensor_34', 'sensor_35', 'sensor_36', 'sensor_37', 'sensor_38', 'sensor_39', 'sensor_40', 'sensor_41', 'sensor_42', \
'sensor_43', 'sensor_44', 'sensor_45', 'sensor_46', 'sensor_47', 'sensor_48', 'sensor_49', 'sensor_50', 'sensor_51']) It fails with a statement that there is an error is line 5. The error reads
Error: Input In [13]
'sensor_34', 'sensor_35', 'sensor_36', 'sensor_37', 'sensor_38', 'sensor_39', 'sensor_40', 'sensor_41', 'sensor_42', \
^
SyntaxError: unexpected character after line continuation character
Now there is nothing that I can see that is after line continuation character. But the error says there is.
What is going on here?
I just want the dataframe to look like this.
Unnamed: 0 sensor_00 sensor_01 sensor_02 sensor_03 sensor_04 sensor_05 sensor_06 sensor_07 sensor_08 ... sensor_42 sensor_43 sensor_44 sensor_45 sensor_46 sensor_47 sensor_48 sensor_49 sensor_50 sensor_51
0 0 2.465394 47.09201 53.2118 46.310760 634.3750 76.45975 13.41146 16.13136 15.56713 ... 31.770832 41.92708 39.641200 65.68287 50.92593 38.194440 157.9861 67.70834 243.0556 201.3889
1 1 2.465394 47.09201 53.2118 46.310760 634.3750 76.45975 13.41146 16.13136 15.56713 ... 31.770832 41.92708 39.641200 65.68287 50.92593 38.194440 157.9861 67.70834 243.0556 201.3889
2 2 2.444734 47.35243 53.2118 46.397570 638.8889 73.54598 13.32465 16.03733 15.61777 ... 31.770830 41.66666 39.351852 65.39352 51.21528 38.194443 155.9606 67.12963 241.3194 203.7037
3 3 2.460474 47.09201 53.1684 46.397568 628.1250 76.98898 13.31742 16.24711 15.69734 ... 31.510420 40.88541 39.062500 64.81481 51.21528 38.194440 155.9606 66.84028 240.4514 203.1250
4 4 2.445718 47.13541 53.2118 46.397568 636.4583 76.58897 13.35359 16.21094 15.69734 ... 31.510420 41.40625 38.773150 65.10416 51.79398 38.773150 158.2755 66.55093 242.1875 201.3889 with the headings in place and of course the numerical columns normalized and scaled.
But I keep getting this error.
What is wrong?
Any help appreciated.
Respectfully,
LZ
Posts: 8,167
Threads: 160
Joined: Sep 2016
Jun-30-2022, 01:46 PM
(This post was last modified: Jun-30-2022, 01:47 PM by buran.)
try to delete anything between \ and first char on next line, maybe some ghost non-printable char? i.e. join the two lines, then add new line char again
And even better, just dynamically create the column names
columns = [f'sensor_{idx:02d}' for idx in range(52)]
df2 = pd.DataFrame(df, columns=columns) by the way, why create second dataframe from what looks like a DataFrame df itself? just to set the column names?
Posts: 170
Threads: 43
Joined: May 2019
Looks like there is alot of "spaces" after that line
'sensor_34', 'sensor_35', 'sensor_36', 'sensor_37', 'sensor_38', 'sensor_39', 'sensor_40', 'sensor_41', 'sensor_42', \
'sensor_43', 'sensor_44', 'sensor_45', 'sensor_46', 'sensor_47', 'sensor_48', 'sensor_49', 'sensor_50', 'sensor_51']) After the 'sensor_42', \ there is alot of spaces
Posts: 164
Threads: 88
Joined: Feb 2021
There are a lot of spaces after the line continuation character. But that is just it, they are spaces and not characters.
I created a new dataframe for the sole purpose of preserving those deleted columns and their content and their position in the dataframe.
Somehow, and I am not sure how, I plan to copy them from the complete dataframe to the slimmed down dataframe (with the newly added headers) and place them in the exact position that they were originally in the first dataframe.
That way I get the dataframe as it originally was, but with scaled and normalized numeric columns.
I know of no other way, but if there is one, then please let me know.
Respectfully,
LZ
Posts: 7,324
Threads: 123
Joined: Sep 2016
If i just copy the code you have posted and run it.
There is no SyntaxError.
import pandas as pd
df2 = pd.DataFrame(df, columns = ['sensor_00', 'sensor_01', 'sensor_02', 'sensor_03', 'sensor_04', 'sensor_05', 'sensor_06', \
'sensor_07', 'sensor_08', 'sensor_09', 'sensor_10', 'sensor_11', 'sensor_12', 'sensor_13', 'sensor_14', 'sensor_15', \
'sensor_16', 'sensor_17', 'sensor_18', 'sensor_19', 'sensor_20', 'sensor_21', 'sensor_22', 'sensor_23', 'sensor_24', \
'sensor_25', 'sensor_26', 'sensor_27', 'sensor_28', 'sensor_29', 'sensor_30', 'sensor_31', 'sensor_32', 'sensor_33', \
'sensor_34', 'sensor_35', 'sensor_36', 'sensor_37', 'sensor_38', 'sensor_39', 'sensor_40', 'sensor_41', 'sensor_42', \
'sensor_43', 'sensor_44', 'sensor_45', 'sensor_46', 'sensor_47', 'sensor_48', 'sensor_49', 'sensor_50', 'sensor_51']) Error: Traceback (most recent call last):
File "<module2>", line 3, in <module>
NameError: name 'df' is not defined
To fix the NameError.
import pandas as pd
df = [[0 for i in range(52)] for j in range(52)]
df2 = pd.DataFrame(df, columns = ['sensor_00', 'sensor_01', 'sensor_02', 'sensor_03', 'sensor_04', 'sensor_05', 'sensor_06', \
'sensor_07', 'sensor_08', 'sensor_09', 'sensor_10', 'sensor_11', 'sensor_12', 'sensor_13', 'sensor_14', 'sensor_15', \
'sensor_16', 'sensor_17', 'sensor_18', 'sensor_19', 'sensor_20', 'sensor_21', 'sensor_22', 'sensor_23', 'sensor_24', \
'sensor_25', 'sensor_26', 'sensor_27', 'sensor_28', 'sensor_29', 'sensor_30', 'sensor_31', 'sensor_32', 'sensor_33', \
'sensor_34', 'sensor_35', 'sensor_36', 'sensor_37', 'sensor_38', 'sensor_39', 'sensor_40', 'sensor_41', 'sensor_42', \
'sensor_43', 'sensor_44', 'sensor_45', 'sensor_46', 'sensor_47', 'sensor_48', 'sensor_49', 'sensor_50', 'sensor_51']) >>> df2
sensor_00 sensor_01 sensor_02 ... sensor_49 sensor_50 sensor_51
0 0 0 0 ... 0 0 0
1 0 0 0 ... 0 0 0
2 0 0 0 ... 0 0 0
3 0 0 0 ... 0 0 0
4 0 0 0 ... 0 0 0
5 0 0 0 ... 0 0 0
6 0 0 0 ... 0 0 0
7 0 0 0 ... 0 0 0
8 0 0 0 ... 0 0 0
..... ect
Posts: 164
Threads: 88
Joined: Feb 2021
Jun-30-2022, 02:53 PM
(This post was last modified: Jun-30-2022, 02:53 PM by Led_Zeppelin.)
I just ran that 2-code line you told me, and I got.
Error: ValueError Traceback (most recent call last)
Input In [15], in <cell line: 2>()
1 columns = [f'sensor_(idx:02d)' for idx in range(52)]
----> 2 df2 = pd.DataFrame(df, columns=columns)
File ~\miniconda3\envs\pump-failure-pred\lib\site-packages\pandas\core\frame.py:694, in DataFrame.__init__(self, data, index, columns, dtype, copy)
684 mgr = dict_to_mgr(
685 # error: Item "ndarray" of "Union[ndarray, Series, Index]" has no
686 # attribute "name"
(...)
691 typ=manager,
692 )
693 else:
--> 694 mgr = ndarray_to_mgr(
695 data,
696 index,
697 columns,
698 dtype=dtype,
699 copy=copy,
700 typ=manager,
701 )
703 # For data is list-like, or Iterable (will consume into list)
704 elif is_list_like(data):
File ~\miniconda3\envs\pump-failure-pred\lib\site-packages\pandas\core\internals\construction.py:351, in ndarray_to_mgr(values, index, columns, dtype, copy, typ)
346 # _prep_ndarray ensures that values.ndim == 2 at this point
347 index, columns = _get_axes(
348 values.shape[0], values.shape[1], index=index, columns=columns
349 )
--> 351 _check_values_indices_shape_match(values, index, columns)
353 if typ == "array":
355 if issubclass(values.dtype.type, str):
File ~\miniconda3\envs\pump-failure-pred\lib\site-packages\pandas\core\internals\construction.py:422, in _check_values_indices_shape_match(values, index, columns)
420 passed = values.shape
421 implied = (len(index), len(columns))
--> 422 raise ValueError(f"Shape of passed values is {passed}, indices imply {implied}")
ValueError: Shape of passed values is (220320, 53), indices imply (220320, 52)
My only guess is that the vector starts at 0 and not 1. Thus, it has 53 and not 52. How to fix?
My guess is to change 52 to 51 in the first line.
Respectfully,
LZ
Posts: 8,167
Threads: 160
Joined: Sep 2016
Jun-30-2022, 03:16 PM
(This post was last modified: Jun-30-2022, 06:10 PM by buran.)
your columns list is from 00 to 51 , that means range(52) . Note that the shape of values is (220320, 53) so I guess you actually need range(53)
also, your code with error has list with 52 sensors, but "expected" result also show column Unnamed: 0
So, again - you have 53 columns
Posts: 6,809
Threads: 20
Joined: Feb 2020
Jun-30-2022, 03:45 PM
(This post was last modified: Jun-30-2022, 03:45 PM by deanhystad.)
According to the documentation you should be able to make a new dataframe from an existing dataframe but it does not work for me. A scaled down version.
import pandas as pd
data = [[i for i in range(1, 11)] for _ in range(5)]
df = pd.DataFrame(data, columns=[f"orig {i}" for i in range(10)])
print(df)
df2 = pd.DataFrame(df, columns=[f"copy {i}" for i in range(10)])
print(df2) Output: orig 0 orig 1 orig 2 orig 3 orig 4 orig 5 orig 6 orig 7 orig 8 orig 9
0 1 2 3 4 5 6 7 8 9 10
1 1 2 3 4 5 6 7 8 9 10
2 1 2 3 4 5 6 7 8 9 10
3 1 2 3 4 5 6 7 8 9 10
4 1 2 3 4 5 6 7 8 9 10
copy 0 copy 1 copy 2 copy 3 copy 4 copy 5 copy 6 copy 7 copy 8 copy 9
0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Posts: 164
Threads: 88
Joined: Feb 2021
Jun-30-2022, 06:00 PM
(This post was last modified: Jun-30-2022, 06:00 PM by Led_Zeppelin.)
It worked for 53 columns. I am not sure why these extra columns were added. I am not sure they are even needed. The
magic number for me is 53 (through trial and error) as in:
columns = [f'sensor_{idx:02d}' for idx in range(53)]
df2 = pd.DataFrame(df, columns=columns) So, should I get rid of all columns that come before sensor_01? I cannot see in any situation where I will need them.
I have not tried range(54). I will now, but as I said 53 worked.
My main concern now is, as I explained previously, is putting the nonnumeric columns onto the slimmed down dataframe in the right order and
with all of their values they had when I uploaded the cv file at the beginning of the program. That is why I made a copy of the dataframe in a previous line.
So how to do that?
Respectfully,
LZ
Posts: 8,167
Threads: 160
Joined: Sep 2016
Jun-30-2022, 06:12 PM
(This post was last modified: Jun-30-2022, 06:15 PM by buran.)
(Jun-30-2022, 06:00 PM)Led_Zeppelin Wrote: I have not tried range(54). I will now, but as I said 53 worked. Sorry, it was typo, as I was replying on the phone. I fixed it. The point is -> 52 columns - range(52). 53 columns - range(53). As I mentioned - I guess you have 53 columns
You are not giving much information, so we can just say - drop the columns you don't need. However there might be other options - e.g. if you read df from file, you can skip columns you don't want to include
|