Python Forum
Changing Column dtypes in DataFrame
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Changing Column dtypes in DataFrame
#1
Hi everyone,

I have a DF with roughly 700 columns. I'd like to create a loop which loops through columns updating dtype category columns to object columns.I have created this loop but is giving me an error. Can anyone help?

for col in df.columns:
    col_type = df[col].dtype
    
    if col_type = category:
        df[col] = df[col].astype('object')
Thanks
Reply
#2
And the error is.... ?
Reply
#3
File "<ipython-input-46-1b7f0f93a132>", line 5
if col_type = category:
^
SyntaxError: invalid syntax
Reply
#4
For comparison == is needed.

EDIT:

There are another approaches. For example to use df.astype() and df.select_dtypes(). Maybe it's overly complex but still.

Below is df with three columns, two of them are int64 and one is object:

>>> df = pd.DataFrame({'nums': range(1, 4), 'more_nums': range(11, 14), 'chars': [*'abc']})
>>> print(df['nums'].dtype,  df['more_nums'].dtype, df['chars'].dtype))
int64 int64 object
We want to use df.astype() to change dtype of columns based on columns current dtype. In order to create mapping needed for df.astype() we can:

- select columns by datatype using df.select_dtypes(include='int64')
- to get column names with specified type df.select_dtypes(include='int64').columns
- use column names for creating dictionary for desired mapping dict.fromkeys(df.select_dtypes(include='int64').columns, 'object')

Now we have constructed desired mapping and can apply it to df.astype():

>>> df = df.astype(dict.fromkeys(df.select_dtypes(include='int64').columns, 'object'))
We can check whether it worked:

>>> print(df['nums'].dtype,  df['more_nums'].dtype, df['chars'].dtype))
object object object
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply
#5
Thanks perfringo.

I have tried == but I get this error:

---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-11-e749837193f3> in <module>()
3 col_type = df[col].dtype
4
----> 5 if col_type == category:
6 df[col] = df[col].astype('object')
7

NameError: name 'category' is not defined

I have tried your approach with category dtypes but am getting this error:

df.select_dtypes(include='category')
df.select_dtypes(include='category').columns
dict.fromkeys(df.select_dtypes(include='category').columns, 'object')

df.astype(dict.fromkeys(df.select_dtypes(include='category').columns, 'object'))

df.dtypes
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-13-d704a5f02358> in <module>()
----> 1 df.select_dtypes(include='category')
2 df.select_dtypes(include='category').columns
3 dict.fromkeys(df.select_dtypes(include='category').columns, 'object')
4
5 df.astype(dict.fromkeys(df.select_dtypes(include='category').columns, 'object'))

C:\Program Files\Anaconda3\lib\site-packages\pandas\core\frame.py in select_dtypes(self, include, exclude)
2257 include, exclude = include or (), exclude or ()
2258 if not (is_list_like(include) and is_list_like(exclude)):
-> 2259 raise TypeError('include and exclude must both be non-string'
2260 ' sequences')
2261 selection = tuple(map(frozenset, (include, exclude)))

TypeError: include and exclude must both be non-string sequences
Reply
#6
As you provided non-working snippet of code in your problem-statement-post it is very hard to predict problems which you may or may not encounter. Nevertheless:

(1) If you haven't assigned name category then quite obviously Python will not find it and will raise the NameError. I assume that you want to compare with string and therefore you could write:

if col_type == 'category':
    # do some stuff
(2) This code does not perform in-place change of data type. Therefore you should assign it to dataframe (as was in code I provided):

df = df.astype(dict.fromkeys(df.select_dtypes(include='int64').columns, 'object'))
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Filter data based on a value from another dataframe column and create a file using lo pawanmtm 1 437 Jul-15-2020, 06:20 PM
Last Post: pawanmtm
  Pandas DataFrame and unmatched column sritsv19 0 512 Jul-07-2020, 12:52 PM
Last Post: sritsv19
  Assigning Column nunique values to another DataFrame column Pythonito 0 313 Jun-25-2020, 05:04 PM
Last Post: Pythonito
  Issue with dataframe column nsadams87xx 0 437 May-29-2020, 02:00 AM
Last Post: nsadams87xx
  Pandas - Dynamic column aggregation based on another column theroadbacktonature 0 407 Apr-17-2020, 04:54 PM
Last Post: theroadbacktonature
  DataFrame: To print a column value which is not null out of 5 columns mani 2 488 Mar-18-2020, 06:07 AM
Last Post: mani
  Convert dataframe string column to numeric in Python darpInd 1 599 Mar-14-2020, 10:07 AM
Last Post: ndc85430
Question Dividing a single column of dataframe into multiple columns based on char length darpInd 2 493 Mar-14-2020, 09:19 AM
Last Post: scidam
  dataframe column mean skip na SriRajesh 0 363 Mar-03-2020, 01:26 PM
Last Post: SriRajesh
  Dropping a column from pandas dataframe marco_ita 6 6,239 Sep-07-2019, 08:36 AM
Last Post: marco_ita

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020