cannot get code to work

Led_Zeppelin · (This post was last modified: Jun-30-2022, 01:24 PM by Led_Zeppelin.)

I have spent a day is trying to get the following code to work.

df2 = pd.DataFrame(df, columns = ['sensor_00', 'sensor_01', 'sensor_02', 'sensor_03', 'sensor_04', 'sensor_05', 'sensor_06', \
'sensor_07', 'sensor_08', 'sensor_09', 'sensor_10', 'sensor_11', 'sensor_12', 'sensor_13', 'sensor_14', 'sensor_15', \
'sensor_16', 'sensor_17', 'sensor_18', 'sensor_19', 'sensor_20', 'sensor_21', 'sensor_22', 'sensor_23', 'sensor_24', \
'sensor_25', 'sensor_26', 'sensor_27', 'sensor_28', 'sensor_29', 'sensor_30', 'sensor_31', 'sensor_32', 'sensor_33', \
'sensor_34', 'sensor_35', 'sensor_36', 'sensor_37', 'sensor_38', 'sensor_39', 'sensor_40', 'sensor_41', 'sensor_42', \                                  
'sensor_43', 'sensor_44', 'sensor_45', 'sensor_46', 'sensor_47', 'sensor_48', 'sensor_49', 'sensor_50', 'sensor_51'])

It fails with a statement that there is an error is line 5. The error reads

Error:Input In [13]
    'sensor_34', 'sensor_35', 'sensor_36', 'sensor_37', 'sensor_38', 'sensor_39', 'sensor_40', 'sensor_41', 'sensor_42', \
                                                                                                                                                            
^
SyntaxError: unexpected character after line continuation character

Now there is nothing that I can see that is after line continuation character. But the error says there is.

What is going on here?

I just want the dataframe to look like this.

Unnamed: 0	sensor_00	sensor_01	sensor_02	sensor_03	sensor_04	sensor_05	sensor_06	sensor_07	sensor_08	...	sensor_42	sensor_43	sensor_44	sensor_45	sensor_46	sensor_47	sensor_48	sensor_49	sensor_50	sensor_51
0	0	2.465394	47.09201	53.2118	46.310760	634.3750	76.45975	13.41146	16.13136	15.56713	...	31.770832	41.92708	39.641200	65.68287	50.92593	38.194440	157.9861	67.70834	243.0556	201.3889
1	1	2.465394	47.09201	53.2118	46.310760	634.3750	76.45975	13.41146	16.13136	15.56713	...	31.770832	41.92708	39.641200	65.68287	50.92593	38.194440	157.9861	67.70834	243.0556	201.3889
2	2	2.444734	47.35243	53.2118	46.397570	638.8889	73.54598	13.32465	16.03733	15.61777	...	31.770830	41.66666	39.351852	65.39352	51.21528	38.194443	155.9606	67.12963	241.3194	203.7037
3	3	2.460474	47.09201	53.1684	46.397568	628.1250	76.98898	13.31742	16.24711	15.69734	...	31.510420	40.88541	39.062500	64.81481	51.21528	38.194440	155.9606	66.84028	240.4514	203.1250
4	4	2.445718	47.13541	53.2118	46.397568	636.4583	76.58897	13.35359	16.21094	15.69734	...	31.510420	41.40625	38.773150	65.10416	51.79398	38.773150	158.2755	66.55093	242.1875	201.3889

with the headings in place and of course the numerical columns normalized and scaled.
But I keep getting this error.

What is wrong?

Any help appreciated.

Respectfully,

LZ

**buran** · (This post was last modified: Jun-30-2022, 01:47 PM by buran.)

try to delete anything between \ and first char on next line, maybe some ghost non-printable char? i.e. join the two lines, then add new line char again

And even better, just dynamically create the column names

columns = [f'sensor_{idx:02d}' for idx in range(52)]
df2 = pd.DataFrame(df, columns=columns)

by the way, why create second dataframe from what looks like a DataFrame df itself? just to set the column names?

cubangt · Jun-30-2022, 01:53 PM

Looks like there is alot of "spaces" after that line

'sensor_34', 'sensor_35', 'sensor_36', 'sensor_37', 'sensor_38', 'sensor_39', 'sensor_40', 'sensor_41', 'sensor_42', \                                  
'sensor_43', 'sensor_44', 'sensor_45', 'sensor_46', 'sensor_47', 'sensor_48', 'sensor_49', 'sensor_50', 'sensor_51'])

After the 'sensor_42', \ there is alot of spaces

Led_Zeppelin · Jun-30-2022, 02:38 PM

There are a lot of spaces after the line continuation character. But that is just it, they are spaces and not characters.

I created a new dataframe for the sole purpose of preserving those deleted columns and their content and their position in the dataframe.

Somehow, and I am not sure how, I plan to copy them from the complete dataframe to the slimmed down dataframe (with the newly added headers) and place them in the exact position that they were originally in the first dataframe.

That way I get the dataframe as it originally was, but with scaled and normalized numeric columns.

I know of no other way, but if there is one, then please let me know.

Respectfully,

LZ

***snippsat*** · Jun-30-2022, 02:51 PM

If i just copy the code you have posted and run it.
There is no SyntaxError.

import pandas as pd

df2 = pd.DataFrame(df, columns = ['sensor_00', 'sensor_01', 'sensor_02', 'sensor_03', 'sensor_04', 'sensor_05', 'sensor_06', \
'sensor_07', 'sensor_08', 'sensor_09', 'sensor_10', 'sensor_11', 'sensor_12', 'sensor_13', 'sensor_14', 'sensor_15', \
'sensor_16', 'sensor_17', 'sensor_18', 'sensor_19', 'sensor_20', 'sensor_21', 'sensor_22', 'sensor_23', 'sensor_24', \
'sensor_25', 'sensor_26', 'sensor_27', 'sensor_28', 'sensor_29', 'sensor_30', 'sensor_31', 'sensor_32', 'sensor_33', \
'sensor_34', 'sensor_35', 'sensor_36', 'sensor_37', 'sensor_38', 'sensor_39', 'sensor_40', 'sensor_41', 'sensor_42', \
'sensor_43', 'sensor_44', 'sensor_45', 'sensor_46', 'sensor_47', 'sensor_48', 'sensor_49', 'sensor_50', 'sensor_51'])

Error:Traceback (most recent call last):
  File "<module2>", line 3, in <module>
NameError: name 'df' is not defined

To fix the NameError.

import pandas as pd

df = [[0 for i in range(52)] for j in range(52)]
df2 = pd.DataFrame(df, columns = ['sensor_00', 'sensor_01', 'sensor_02', 'sensor_03', 'sensor_04', 'sensor_05', 'sensor_06', \
'sensor_07', 'sensor_08', 'sensor_09', 'sensor_10', 'sensor_11', 'sensor_12', 'sensor_13', 'sensor_14', 'sensor_15', \
'sensor_16', 'sensor_17', 'sensor_18', 'sensor_19', 'sensor_20', 'sensor_21', 'sensor_22', 'sensor_23', 'sensor_24', \
'sensor_25', 'sensor_26', 'sensor_27', 'sensor_28', 'sensor_29', 'sensor_30', 'sensor_31', 'sensor_32', 'sensor_33', \
'sensor_34', 'sensor_35', 'sensor_36', 'sensor_37', 'sensor_38', 'sensor_39', 'sensor_40', 'sensor_41', 'sensor_42', \
'sensor_43', 'sensor_44', 'sensor_45', 'sensor_46', 'sensor_47', 'sensor_48', 'sensor_49', 'sensor_50', 'sensor_51'])

>>> df2
    sensor_00  sensor_01  sensor_02  ...  sensor_49  sensor_50  sensor_51
0           0          0          0  ...          0          0          0
1           0          0          0  ...          0          0          0
2           0          0          0  ...          0          0          0
3           0          0          0  ...          0          0          0
4           0          0          0  ...          0          0          0
5           0          0          0  ...          0          0          0
6           0          0          0  ...          0          0          0
7           0          0          0  ...          0          0          0
8           0          0          0  ...          0          0          0
..... ect

Led_Zeppelin · (This post was last modified: Jun-30-2022, 02:53 PM by Led_Zeppelin.)

I just ran that 2-code line you told me, and I got.

Error:ValueError                                Traceback (most recent call last)
Input In [15], in <cell line: 2>()
      1 columns = [f'sensor_(idx:02d)' for idx in range(52)]
----> 2 df2 = pd.DataFrame(df, columns=columns)

File ~\miniconda3\envs\pump-failure-pred\lib\site-packages\pandas\core\frame.py:694, in DataFrame.__init__(self, data, index, columns, dtype, copy)
    684         mgr = dict_to_mgr(
    685             # error: Item "ndarray" of "Union[ndarray, Series, Index]" has no
    686             # attribute "name"
   (...)
    691             typ=manager,
    692         )
    693     else:
--> 694         mgr = ndarray_to_mgr(
    695             data,
    696             index,
    697             columns,
    698             dtype=dtype,
    699             copy=copy,
    700             typ=manager,
    701         )
    703 # For data is list-like, or Iterable (will consume into list)
    704 elif is_list_like(data):

File ~\miniconda3\envs\pump-failure-pred\lib\site-packages\pandas\core\internals\construction.py:351, in ndarray_to_mgr(values, index, columns, dtype, copy, typ)
    346 # _prep_ndarray ensures that values.ndim == 2 at this point
    347 index, columns = _get_axes(
    348     values.shape[0], values.shape[1], index=index, columns=columns
    349 )
--> 351 _check_values_indices_shape_match(values, index, columns)
    353 if typ == "array":
    355     if issubclass(values.dtype.type, str):

File ~\miniconda3\envs\pump-failure-pred\lib\site-packages\pandas\core\internals\construction.py:422, in _check_values_indices_shape_match(values, index, columns)
    420 passed = values.shape
    421 implied = (len(index), len(columns))
--> 422 raise ValueError(f"Shape of passed values is {passed}, indices imply {implied}")

ValueError: Shape of passed values is (220320, 53), indices imply (220320, 52)

My only guess is that the vector starts at 0 and not 1. Thus, it has 53 and not 52. How to fix?

My guess is to change 52 to 51 in the first line.

Respectfully,

LZ

**buran** · (This post was last modified: Jun-30-2022, 06:10 PM by buran.)

your columns list is from 00 to 51, that means range(52). Note that the shape of values is (220320, 53) so I guess you actually need range(53)

also, your code with error has list with 52 sensors, but "expected" result also show column Unnamed: 0
So, again - you have 53 columns

**deanhystad** · (This post was last modified: Jun-30-2022, 03:45 PM by deanhystad.)

According to the documentation you should be able to make a new dataframe from an existing dataframe but it does not work for me. A scaled down version.

import pandas as pd

data = [[i for i in range(1, 11)] for _ in range(5)]
df = pd.DataFrame(data, columns=[f"orig {i}" for i in range(10)])
print(df)

df2 = pd.DataFrame(df, columns=[f"copy {i}" for i in range(10)])
print(df2)

Output:   orig 0  orig 1  orig 2  orig 3  orig 4  orig 5  orig 6  orig 7  orig 8  orig 9
0       1       2       3       4       5       6       7       8       9      10
1       1       2       3       4       5       6       7       8       9      10
2       1       2       3       4       5       6       7       8       9      10
3       1       2       3       4       5       6       7       8       9      10
4       1       2       3       4       5       6       7       8       9      10
   copy 0  copy 1  copy 2  copy 3  copy 4  copy 5  copy 6  copy 7  copy 8  copy 9
0     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
1     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
2     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
3     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
4     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN

Led_Zeppelin · (This post was last modified: Jun-30-2022, 06:00 PM by Led_Zeppelin.)

It worked for 53 columns. I am not sure why these extra columns were added. I am not sure they are even needed. The
magic number for me is 53 (through trial and error) as in:

columns = [f'sensor_{idx:02d}' for idx in range(53)]
df2 = pd.DataFrame(df, columns=columns)

So, should I get rid of all columns that come before sensor_01? I cannot see in any situation where I will need them.

I have not tried range(54). I will now, but as I said 53 worked.

My main concern now is, as I explained previously, is putting the nonnumeric columns onto the slimmed down dataframe in the right order and
with all of their values they had when I uploaded the cv file at the beginning of the program. That is why I made a copy of the dataframe in a previous line.

So how to do that?

Respectfully,

LZ

**buran** · (This post was last modified: Jun-30-2022, 06:15 PM by buran.)

(Jun-30-2022, 06:00 PM)Led_Zeppelin Wrote: I have not tried range(54). I will now, but as I said 53 worked.

Sorry, it was typo, as I was replying on the phone. I fixed it. The point is -> 52 columns - range(52). 53 columns - range(53). As I mentioned - I guess you have 53 columns

You are not giving much information, so we can just say - drop the columns you don't need. However there might be other options - e.g. if you read df from file, you can skip columns you don't want to include

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Can't get graph code to work properly.	KDDDC2DS	1	689	Sep-16-2024, 09:17 PM Last Post: deanhystad
	I can't for the life of me get this basic If statement code to work	CandleType1a	8	2,332	May-21-2024, 03:58 PM Last Post: CandleType1a
	hi need help to make this code work correctly	atulkul1985	5	1,982	Nov-20-2023, 04:38 PM Last Post: deanhystad
	newbie question - can't make code work	tronic72	2	1,561	Oct-22-2023, 09:08 PM Last Post: tronic72
	Beginner: Code not work when longer list	raiviscoding	2	1,752	May-19-2023, 11:19 AM Last Post: deanhystad
	Why doesn't this code work? What is wrong with path?	Melcu54	7	3,595	Jan-29-2023, 06:24 PM Last Post: Melcu54
	Code used to work 100%, now sometimes works!	muzicman0	5	2,792	Jan-13-2023, 05:09 PM Last Post: muzicman0
	color code doesn't work	harryvl	1	1,875	Dec-29-2022, 08:59 PM Last Post: deanhystad
	Something the code dont work	AlexPython	13	4,435	Oct-17-2022, 08:34 PM Last Post: AlexPython
	How does this code work?	pd_minh12	3	2,119	Apr-15-2022, 02:50 AM Last Post: Pedroski55

cannot get code to work

User Panel Messages

Announcements