Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Cannot rename Columns in df
#1
Greetings!
I need to rename Columns in the Dataframe. I found if the name has a space my script is failing.
I tried a lot of examples from the Net and nothing is working for some reason. Column Names are staying the same. Sad
here is a "renaming" snippet:
exlf = pd.read_excel('C:/01/XLS_Files/SHORT.xlsx', keep_default_na=False)
exlf = exlf.rename(columns={"Visual ID": "Visual_ID","VR Duration(Days)": "VR_D"})
print(exlf)
Thank you.
Reply
#2
You must have an error in your column names, cause it works fine for me.
import pandas as pd

df = pd.DataFrame({"Visual ID": ["A", "B", "C", "D"],"VR Duration(Days)": [1, 2, 3, 4]})
df = df.rename(columns={"Visual ID": "Visual_ID","VR Duration(Days)": "VR_D"})
print(df)
Output:
Visual_ID VR_D 0 A 1 1 B 2 2 C 3 3 D 4
Look for trailing spaces.
import pandas as pd

df = pd.DataFrame({"Visual ID ": ["A", "B", "C", "D"],"VR Duration(Days) ": [1, 2, 3, 4]})
df = df.rename(columns={"Visual ID": "Visual_ID","VR Duration(Days)": "VR_D"})
print(df)
Output:
Visual ID VR Duration(Days) 0 A 1 1 B 2 2 C 3 3 D 4
Reply
#3
You are right. I found there are blank rows at the beginning of the Excel file, Unnamed 0, Unnamed 1, and so on... that is a problem. I thought I could get rid of the "Unnamed" rows and then rename the Headers.
Apparently, that is a huge problem. I tried the following with no good results:
exlf = pd.read_excel('C:/01/XLS_Files/SHORT.xlsx', keep_default_na=False)
exlf = exlf.dropna(how='all').dropna(axis=1, how='all')
exlf = exlf.loc[:, ~exlf.columns.str.contains('^Unnamed')]
exlf = exlf.drop('Unnamed: 0', axis=1, inplace=True)
exlf = exlf.drop(exlf.filter(regex="Unname"),axis=1, inplace=True)
Some of these also remove headers
I could probably just convert the Excel file to CSV, and work with it as a text file. Remove blank lines, rename the headers, and completely process it as I wanted, I thought I could do it all in Pandas, but even removing blank rows is a problem. What I'm doing wrong?
Thank you.
Reply
#4
(Aug-25-2023, 06:30 PM)tester_V Wrote: I found there are blank rows at the beginning of the Excel file, Unnamed 0, Unnamed 1, and so on.
import pandas as pd

df = pd.read_excel('test1.xlsx'df)
>>> df
   Unnamed: 0  Unnamed: 1               datettime  ...  Hostname  Vlan nonWPdschBler
0         NaN         NaN  20221108.181509.814817  ...       EPW  ajjj            25
1         NaN         NaN  20221108.181519.814948  ...       ETW    ab            50

[2 rows x 8 columns]

>>> df2 = df[df.columns[2:]]
>>> df2
                datettime      Tput Description Hostname  Vlan  nonWPdschBler
0  20221108.181509.814817  0.000091      router      EPW  ajjj             25
1  20221108.181519.814948  0.000131       WIDIS      ETW    ab             50
Remove when read in.
import pandas as pd

df = pd.read_excel('test1.xlsx', usecols=lambda x: 'Unnamed' not in x)
>>> df
                datettime      Tput Description Hostname  Vlan  nonWPdschBler
0  20221108.181509.814817  0.000091      router      EPW  ajjj             25
1  20221108.181519.814948  0.000131       WIDIS      ETW    ab             50
tester_V likes this post
Reply
#5
Thank you for looking into it!
The Excel file I'm reading has usually 2 first rows blank sometimes more. I'd like to completely remove blank rows.
When I used your code:
exlf = pd.read_excel('C:/01/XLS_Files/SHORT.xlsx', usecols=lambda x: 'Unnamed' not in x)
print(exlf)
it produces:
Empty DataFrame
Columns: []
Index: []
I'm getting really frustrated with this, it is such a simple tusk Confused . Should I try different modules?
Thank you.
Reply
#6
Post SHORT.xlsx or sample if it's big.
Reply
#7
I attached a small part of the file.

Attached Files

.xlsx   To_F.xlsx (Size: 9.13 KB / Downloads: 38)
Reply
#8
Ok i missed that is was first rows.
import pandas as pd
#pd.set_option('display.max_columns', None)

df = pd.read_excel('To_F.xlsx', skiprows=2)
>>> df
   Visual ID  IsIps  ... Store Site Worker Site
0     175323      1  ...       ORDA    ORDA_BDC
1     175323      1  ...       ORDA    ORDA_BDC
2     175323      1  ...       ORDN    ORDA_BDC
3     175323      1  ...       NMVC    NMVC_BDC
4     175323      1  ...       NMVC    NMVC_BDC
5     175323      1  ...       NMVC    NMVC_BDC

[6 rows x 11 columns]
>>> df.dtypes
Visual ID                 int64
IsIps                     int64
Store Last Seen Date     object
Store Duration           object
Store Duration(Days)      int64
QB Last Seen Date       float64
QB Duration             float64
QB Duration(Days)       float64
Last Seen Location       object
Store Site               object
Worker Site              object
dtype: object

# Fix Date
>>> df['Store Last Seen Date']= pd.to_datetime(df['Store Last Seen Date'])
>>> df.dtypes
Visual ID                        int64
IsIps                            int64
Store Last Seen Date    datetime64[ns]
Store Duration                  object
Store Duration(Days)             int64
QB Last Seen Date              float64
QB Duration                    float64
QB Duration(Days)              float64
Last Seen Location              object
Store Site                      object
Worker Site                     object
dtype: object
So now should the DataFame be ok,if need delete columns with only NaN.
>>> df = df.dropna(how='all', axis=1)
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 8 columns):
 #   Column                Non-Null Count  Dtype         
---  ------                --------------  -----         
 0   Visual ID             6 non-null      int64         
 1   IsIps                 6 non-null      int64         
 2   Store Last Seen Date  6 non-null      datetime64[ns]
 3   Store Duration        6 non-null      object        
 4   Store Duration(Days)  6 non-null      int64         
 5   Last Seen Location    6 non-null      object        
 6   Store Site            6 non-null      object        
 7   Worker Site           6 non-null      object        
dtypes: datetime64[ns](1), int64(3), object(4)
memory usage: 512.0+ bytes
tester_V likes this post
Reply
#9
OP says the number of blank rows is not always the same. I think a two step approach is headed. First read in the excel file. Count the number of leading blanks, then read in the file again. Something like this:
import pandas as pd


df = pd.read_excel("To_f.xlsx")
for i, value in enumerate(df.iloc[:, 0]):
    if type(value) is int:
        if i > 0:
            df = pd.read_excel("To_f.xlsx", skiprows=i)
        break
print(df)
tester_V likes this post
Reply
#10
Thank you!
I really appreciate your help!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  How to rename dataframe columns based on the content in an index? ar_mahdavi 2 2,492 Jun-07-2021, 06:09 AM
Last Post: ricslato

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020