Python Forum
Changing Column dtypes in DataFrame
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Changing Column dtypes in DataFrame
#1
Hi everyone,

I have a DF with roughly 700 columns. I'd like to create a loop which loops through columns updating dtype category columns to object columns.I have created this loop but is giving me an error. Can anyone help?

for col in df.columns:
    col_type = df[col].dtype
    
    if col_type = category:
        df[col] = df[col].astype('object')
Thanks
Reply
#2
And the error is.... ?
Reply
#3
File "<ipython-input-46-1b7f0f93a132>", line 5
if col_type = category:
^
SyntaxError: invalid syntax
Reply
#4
For comparison == is needed.

EDIT:

There are another approaches. For example to use df.astype() and df.select_dtypes(). Maybe it's overly complex but still.

Below is df with three columns, two of them are int64 and one is object:

>>> df = pd.DataFrame({'nums': range(1, 4), 'more_nums': range(11, 14), 'chars': [*'abc']})
>>> print(df['nums'].dtype,  df['more_nums'].dtype, df['chars'].dtype))
int64 int64 object
We want to use df.astype() to change dtype of columns based on columns current dtype. In order to create mapping needed for df.astype() we can:

- select columns by datatype using df.select_dtypes(include='int64')
- to get column names with specified type df.select_dtypes(include='int64').columns
- use column names for creating dictionary for desired mapping dict.fromkeys(df.select_dtypes(include='int64').columns, 'object')

Now we have constructed desired mapping and can apply it to df.astype():

>>> df = df.astype(dict.fromkeys(df.select_dtypes(include='int64').columns, 'object'))
We can check whether it worked:

>>> print(df['nums'].dtype,  df['more_nums'].dtype, df['chars'].dtype))
object object object
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply
#5
Thanks perfringo.

I have tried == but I get this error:

---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-11-e749837193f3> in <module>()
3 col_type = df[col].dtype
4
----> 5 if col_type == category:
6 df[col] = df[col].astype('object')
7

NameError: name 'category' is not defined

I have tried your approach with category dtypes but am getting this error:

df.select_dtypes(include='category')
df.select_dtypes(include='category').columns
dict.fromkeys(df.select_dtypes(include='category').columns, 'object')

df.astype(dict.fromkeys(df.select_dtypes(include='category').columns, 'object'))

df.dtypes
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-13-d704a5f02358> in <module>()
----> 1 df.select_dtypes(include='category')
2 df.select_dtypes(include='category').columns
3 dict.fromkeys(df.select_dtypes(include='category').columns, 'object')
4
5 df.astype(dict.fromkeys(df.select_dtypes(include='category').columns, 'object'))

C:\Program Files\Anaconda3\lib\site-packages\pandas\core\frame.py in select_dtypes(self, include, exclude)
2257 include, exclude = include or (), exclude or ()
2258 if not (is_list_like(include) and is_list_like(exclude)):
-> 2259 raise TypeError('include and exclude must both be non-string'
2260 ' sequences')
2261 selection = tuple(map(frozenset, (include, exclude)))

TypeError: include and exclude must both be non-string sequences
Reply
#6
As you provided non-working snippet of code in your problem-statement-post it is very hard to predict problems which you may or may not encounter. Nevertheless:

(1) If you haven't assigned name category then quite obviously Python will not find it and will raise the NameError. I assume that you want to compare with string and therefore you could write:

if col_type == 'category':
    # do some stuff
(2) This code does not perform in-place change of data type. Therefore you should assign it to dataframe (as was in code I provided):

df = df.astype(dict.fromkeys(df.select_dtypes(include='int64').columns, 'object'))
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  concat 3 columns of dataframe to one column flash77 2 776 Oct-03-2023, 09:29 PM
Last Post: flash77
  HTML Decoder pandas dataframe column mbrown009 3 961 Sep-29-2023, 05:56 PM
Last Post: deanhystad
  attempt to split values from within a dataframe column mbrown009 8 2,217 Apr-10-2023, 02:06 AM
Last Post: mbrown009
  New Dataframe Column Based on Several Conditions nb1214 1 1,781 Nov-16-2021, 10:52 PM
Last Post: jefsummers
  Putting column name to dataframe, can't work. jonah88888 1 1,803 Sep-28-2021, 07:45 PM
Last Post: deanhystad
  Setting the x-axis to a specific column in a dataframe devansing 0 1,993 May-23-2021, 12:11 AM
Last Post: devansing
Question [Solved] How to refer to dataframe column name based on a list lorensa74 1 2,238 May-17-2021, 07:02 AM
Last Post: lorensa74
Question Pandas - Creating additional column in dataframe from another column Azureaus 2 2,913 Jan-11-2021, 09:53 PM
Last Post: Azureaus
  Filter data based on a value from another dataframe column and create a file using lo pawanmtm 1 4,242 Jul-15-2020, 06:20 PM
Last Post: pawanmtm
  Pandas DataFrame and unmatched column sritsv19 0 2,989 Jul-07-2020, 12:52 PM
Last Post: sritsv19

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020