Python Forum
Changing Column dtypes in DataFrame - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: Changing Column dtypes in DataFrame (/thread-23884.html)



Changing Column dtypes in DataFrame - Scott - Jan-22-2020

Hi everyone,

I have a DF with roughly 700 columns. I'd like to create a loop which loops through columns updating dtype category columns to object columns.I have created this loop but is giving me an error. Can anyone help?

for col in df.columns:
    col_type = df[col].dtype
    
    if col_type = category:
        df[col] = df[col].astype('object')
Thanks


RE: Changing Column dtypes in DataFrame - jefsummers - Jan-22-2020

And the error is.... ?


RE: Changing Column dtypes in DataFrame - Scott - Jan-22-2020

File "<ipython-input-46-1b7f0f93a132>", line 5
if col_type = category:
^
SyntaxError: invalid syntax


RE: Changing Column dtypes in DataFrame - perfringo - Jan-22-2020

For comparison == is needed.

EDIT:

There are another approaches. For example to use df.astype() and df.select_dtypes(). Maybe it's overly complex but still.

Below is df with three columns, two of them are int64 and one is object:

>>> df = pd.DataFrame({'nums': range(1, 4), 'more_nums': range(11, 14), 'chars': [*'abc']})
>>> print(df['nums'].dtype,  df['more_nums'].dtype, df['chars'].dtype))
int64 int64 object
We want to use df.astype() to change dtype of columns based on columns current dtype. In order to create mapping needed for df.astype() we can:

- select columns by datatype using df.select_dtypes(include='int64')
- to get column names with specified type df.select_dtypes(include='int64').columns
- use column names for creating dictionary for desired mapping dict.fromkeys(df.select_dtypes(include='int64').columns, 'object')

Now we have constructed desired mapping and can apply it to df.astype():

>>> df = df.astype(dict.fromkeys(df.select_dtypes(include='int64').columns, 'object'))
We can check whether it worked:

>>> print(df['nums'].dtype,  df['more_nums'].dtype, df['chars'].dtype))
object object object



RE: Changing Column dtypes in DataFrame - Scott - Jan-22-2020

Thanks perfringo.

I have tried == but I get this error:

---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-11-e749837193f3> in <module>()
3 col_type = df[col].dtype
4
----> 5 if col_type == category:
6 df[col] = df[col].astype('object')
7

NameError: name 'category' is not defined

I have tried your approach with category dtypes but am getting this error:

df.select_dtypes(include='category')
df.select_dtypes(include='category').columns
dict.fromkeys(df.select_dtypes(include='category').columns, 'object')

df.astype(dict.fromkeys(df.select_dtypes(include='category').columns, 'object'))

df.dtypes
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-13-d704a5f02358> in <module>()
----> 1 df.select_dtypes(include='category')
2 df.select_dtypes(include='category').columns
3 dict.fromkeys(df.select_dtypes(include='category').columns, 'object')
4
5 df.astype(dict.fromkeys(df.select_dtypes(include='category').columns, 'object'))

C:\Program Files\Anaconda3\lib\site-packages\pandas\core\frame.py in select_dtypes(self, include, exclude)
2257 include, exclude = include or (), exclude or ()
2258 if not (is_list_like(include) and is_list_like(exclude)):
-> 2259 raise TypeError('include and exclude must both be non-string'
2260 ' sequences')
2261 selection = tuple(map(frozenset, (include, exclude)))

TypeError: include and exclude must both be non-string sequences


RE: Changing Column dtypes in DataFrame - perfringo - Jan-23-2020

As you provided non-working snippet of code in your problem-statement-post it is very hard to predict problems which you may or may not encounter. Nevertheless:

(1) If you haven't assigned name category then quite obviously Python will not find it and will raise the NameError. I assume that you want to compare with string and therefore you could write:

if col_type == 'category':
    # do some stuff
(2) This code does not perform in-place change of data type. Therefore you should assign it to dataframe (as was in code I provided):

df = df.astype(dict.fromkeys(df.select_dtypes(include='int64').columns, 'object'))