Changing Column dtypes in DataFrame - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: Changing Column dtypes in DataFrame (/thread-23884.html) |
Changing Column dtypes in DataFrame - Scott - Jan-22-2020 Hi everyone, I have a DF with roughly 700 columns. I'd like to create a loop which loops through columns updating dtype category columns to object columns.I have created this loop but is giving me an error. Can anyone help? for col in df.columns: col_type = df[col].dtype if col_type = category: df[col] = df[col].astype('object')Thanks RE: Changing Column dtypes in DataFrame - jefsummers - Jan-22-2020 And the error is.... ? RE: Changing Column dtypes in DataFrame - Scott - Jan-22-2020 File "<ipython-input-46-1b7f0f93a132>", line 5 if col_type = category: ^ SyntaxError: invalid syntax RE: Changing Column dtypes in DataFrame - perfringo - Jan-22-2020 For comparison == is needed. EDIT: There are another approaches. For example to use df.astype() and df.select_dtypes(). Maybe it's overly complex but still. Below is df with three columns, two of them are int64 and one is object: >>> df = pd.DataFrame({'nums': range(1, 4), 'more_nums': range(11, 14), 'chars': [*'abc']}) >>> print(df['nums'].dtype, df['more_nums'].dtype, df['chars'].dtype)) int64 int64 objectWe want to use df.astype() to change dtype of columns based on columns current dtype. In order to create mapping needed for df.astype() we can: - select columns by datatype using df.select_dtypes(include='int64') - to get column names with specified type df.select_dtypes(include='int64').columns - use column names for creating dictionary for desired mapping dict.fromkeys(df.select_dtypes(include='int64').columns, 'object') Now we have constructed desired mapping and can apply it to df.astype(): >>> df = df.astype(dict.fromkeys(df.select_dtypes(include='int64').columns, 'object'))We can check whether it worked: >>> print(df['nums'].dtype, df['more_nums'].dtype, df['chars'].dtype)) object object object RE: Changing Column dtypes in DataFrame - Scott - Jan-22-2020 Thanks perfringo. I have tried == but I get this error: --------------------------------------------------------------------------- NameError Traceback (most recent call last) <ipython-input-11-e749837193f3> in <module>() 3 col_type = df[col].dtype 4 ----> 5 if col_type == category: 6 df[col] = df[col].astype('object') 7 NameError: name 'category' is not defined I have tried your approach with category dtypes but am getting this error: df.select_dtypes(include='category') df.select_dtypes(include='category').columns dict.fromkeys(df.select_dtypes(include='category').columns, 'object') df.astype(dict.fromkeys(df.select_dtypes(include='category').columns, 'object')) df.dtypes--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-13-d704a5f02358> in <module>() ----> 1 df.select_dtypes(include='category') 2 df.select_dtypes(include='category').columns 3 dict.fromkeys(df.select_dtypes(include='category').columns, 'object') 4 5 df.astype(dict.fromkeys(df.select_dtypes(include='category').columns, 'object')) C:\Program Files\Anaconda3\lib\site-packages\pandas\core\frame.py in select_dtypes(self, include, exclude) 2257 include, exclude = include or (), exclude or () 2258 if not (is_list_like(include) and is_list_like(exclude)): -> 2259 raise TypeError('include and exclude must both be non-string' 2260 ' sequences') 2261 selection = tuple(map(frozenset, (include, exclude))) TypeError: include and exclude must both be non-string sequences RE: Changing Column dtypes in DataFrame - perfringo - Jan-23-2020 As you provided non-working snippet of code in your problem-statement-post it is very hard to predict problems which you may or may not encounter. Nevertheless: (1) If you haven't assigned name category then quite obviously Python will not find it and will raise the NameError. I assume that you want to compare with string and therefore you could write: if col_type == 'category': # do some stuff(2) This code does not perform in-place change of data type. Therefore you should assign it to dataframe (as was in code I provided): df = df.astype(dict.fromkeys(df.select_dtypes(include='int64').columns, 'object')) |