Posts: 414
Threads: 111
Joined: Apr 2020
Greetings!
I need to rename Columns in the Dataframe. I found if the name has a space my script is failing.
I tried a lot of examples from the Net and nothing is working for some reason. Column Names are staying the same.
here is a "renaming" snippet:
exlf = pd.read_excel('C:/01/XLS_Files/SHORT.xlsx', keep_default_na=False)
exlf = exlf.rename(columns={"Visual ID": "Visual_ID","VR Duration(Days)": "VR_D"})
print(exlf) Thank you.
Posts: 6,800
Threads: 20
Joined: Feb 2020
You must have an error in your column names, cause it works fine for me.
import pandas as pd
df = pd.DataFrame({"Visual ID": ["A", "B", "C", "D"],"VR Duration(Days)": [1, 2, 3, 4]})
df = df.rename(columns={"Visual ID": "Visual_ID","VR Duration(Days)": "VR_D"})
print(df) Output: Visual_ID VR_D
0 A 1
1 B 2
2 C 3
3 D 4
Look for trailing spaces.
import pandas as pd
df = pd.DataFrame({"Visual ID ": ["A", "B", "C", "D"],"VR Duration(Days) ": [1, 2, 3, 4]})
df = df.rename(columns={"Visual ID": "Visual_ID","VR Duration(Days)": "VR_D"})
print(df) Output: Visual ID VR Duration(Days)
0 A 1
1 B 2
2 C 3
3 D 4
Posts: 414
Threads: 111
Joined: Apr 2020
You are right. I found there are blank rows at the beginning of the Excel file, Unnamed 0, Unnamed 1, and so on... that is a problem. I thought I could get rid of the "Unnamed" rows and then rename the Headers.
Apparently, that is a huge problem. I tried the following with no good results:
exlf = pd.read_excel('C:/01/XLS_Files/SHORT.xlsx', keep_default_na=False)
exlf = exlf.dropna(how='all').dropna(axis=1, how='all')
exlf = exlf.loc[:, ~exlf.columns.str.contains('^Unnamed')]
exlf = exlf.drop('Unnamed: 0', axis=1, inplace=True)
exlf = exlf.drop(exlf.filter(regex="Unname"),axis=1, inplace=True) Some of these also remove headers
I could probably just convert the Excel file to CSV, and work with it as a text file. Remove blank lines, rename the headers, and completely process it as I wanted, I thought I could do it all in Pandas, but even removing blank rows is a problem. What I'm doing wrong?
Thank you.
Posts: 7,320
Threads: 123
Joined: Sep 2016
(Aug-25-2023, 06:30 PM)tester_V Wrote: I found there are blank rows at the beginning of the Excel file, Unnamed 0, Unnamed 1, and so on. import pandas as pd
df = pd.read_excel('test1.xlsx'df) >>> df
Unnamed: 0 Unnamed: 1 datettime ... Hostname Vlan nonWPdschBler
0 NaN NaN 20221108.181509.814817 ... EPW ajjj 25
1 NaN NaN 20221108.181519.814948 ... ETW ab 50
[2 rows x 8 columns]
>>> df2 = df[df.columns[2:]]
>>> df2
datettime Tput Description Hostname Vlan nonWPdschBler
0 20221108.181509.814817 0.000091 router EPW ajjj 25
1 20221108.181519.814948 0.000131 WIDIS ETW ab 50 Remove when read in.
import pandas as pd
df = pd.read_excel('test1.xlsx', usecols=lambda x: 'Unnamed' not in x) >>> df
datettime Tput Description Hostname Vlan nonWPdschBler
0 20221108.181509.814817 0.000091 router EPW ajjj 25
1 20221108.181519.814948 0.000131 WIDIS ETW ab 50
Posts: 414
Threads: 111
Joined: Apr 2020
Thank you for looking into it!
The Excel file I'm reading has usually 2 first rows blank sometimes more. I'd like to completely remove blank rows.
When I used your code:
exlf = pd.read_excel('C:/01/XLS_Files/SHORT.xlsx', usecols=lambda x: 'Unnamed' not in x)
print(exlf) it produces:
Empty DataFrame
Columns: []
Index: []
I'm getting really frustrated with this, it is such a simple tusk  . Should I try different modules?
Thank you.
Posts: 7,320
Threads: 123
Joined: Sep 2016
Post SHORT.xlsx or sample if it's big.
Posts: 414
Threads: 111
Joined: Apr 2020
I attached a small part of the file.
Attached Files
To_F.xlsx (Size: 9.13 KB / Downloads: 77)
Posts: 7,320
Threads: 123
Joined: Sep 2016
Ok i missed that is was first rows.
import pandas as pd
#pd.set_option('display.max_columns', None)
df = pd.read_excel('To_F.xlsx', skiprows=2) >>> df
Visual ID IsIps ... Store Site Worker Site
0 175323 1 ... ORDA ORDA_BDC
1 175323 1 ... ORDA ORDA_BDC
2 175323 1 ... ORDN ORDA_BDC
3 175323 1 ... NMVC NMVC_BDC
4 175323 1 ... NMVC NMVC_BDC
5 175323 1 ... NMVC NMVC_BDC
[6 rows x 11 columns]
>>> df.dtypes
Visual ID int64
IsIps int64
Store Last Seen Date object
Store Duration object
Store Duration(Days) int64
QB Last Seen Date float64
QB Duration float64
QB Duration(Days) float64
Last Seen Location object
Store Site object
Worker Site object
dtype: object
# Fix Date
>>> df['Store Last Seen Date']= pd.to_datetime(df['Store Last Seen Date'])
>>> df.dtypes
Visual ID int64
IsIps int64
Store Last Seen Date datetime64[ns]
Store Duration object
Store Duration(Days) int64
QB Last Seen Date float64
QB Duration float64
QB Duration(Days) float64
Last Seen Location object
Store Site object
Worker Site object
dtype: object So now should the DataFame be ok,if need delete columns with only NaN.
>>> df = df.dropna(how='all', axis=1)
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Visual ID 6 non-null int64
1 IsIps 6 non-null int64
2 Store Last Seen Date 6 non-null datetime64[ns]
3 Store Duration 6 non-null object
4 Store Duration(Days) 6 non-null int64
5 Last Seen Location 6 non-null object
6 Store Site 6 non-null object
7 Worker Site 6 non-null object
dtypes: datetime64[ns](1), int64(3), object(4)
memory usage: 512.0+ bytes
Posts: 6,800
Threads: 20
Joined: Feb 2020
OP says the number of blank rows is not always the same. I think a two step approach is headed. First read in the excel file. Count the number of leading blanks, then read in the file again. Something like this:
import pandas as pd
df = pd.read_excel("To_f.xlsx")
for i, value in enumerate(df.iloc[:, 0]):
if type(value) is int:
if i > 0:
df = pd.read_excel("To_f.xlsx", skiprows=i)
break
print(df)
Posts: 414
Threads: 111
Joined: Apr 2020
Thank you!
I really appreciate your help!
|