Python Forum
Does the order of columns in the DataFrame matter? - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: Does the order of columns in the DataFrame matter? (/thread-24407.html)

Pages: 1 2


Does the order of columns in the DataFrame matter? - new_to_python - Feb-12-2020

Hi, I noticed that sometimes when doing certain operations, the order of the columns in the DataFrame got changed automatically. Same thing happened when I tried out some examples in books. Using the same commands, my DataFrames have the columns shown in different orders as those shown in the books. Does the order of DataFrame matters in Python/pandas/numpy?


RE: Does the order of columns in the DataFrame matter? - scidam - Feb-13-2020

Yes, it does.
Look at the following example

import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.rand(10, 5)) 
print(df.values)  # print corresponding numpy array
print(df[[2,1,3,4,0]].values) # reorder columns and print
The answer depends on how you are accessing data in data-frame. If you access columns by names, e.g. df.loc[:, 'some_name'] and never use index-based access, e.g. something like df.iloc[:, 4], you can not worry about the order of columns.


RE: Does the order of columns in the DataFrame matter? - new_to_python - Feb-13-2020

Thanks. I came across the following example:

# Example 2

In [194]: lefth = pd.DataFrame({'key1': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'], 
     ...:                       'key2': [2000, 2001, 2002, 2001, 2002], 
     ...:                       'data': np.arange(5.)})  
In [196]: lefth                                                                                           
Out[196]: 
     key1  key2  data
0    Ohio  2000   0.0
1    Ohio  2001   1.0
2    Ohio  2002   2.0
3  Nevada  2001   3.0
4  Nevada  2002   4.0
As indicated above, on my machine the columns are listed as key1, key2 and data which seems to be according to the order I entered the columns in the pd.DataFrame command. However, the person who made this example has the columns displayed as data followed by key1 and key2 using the same command. How come? I don't quite remember well but I think somebody mentioned that depending on the version python is used, the columns could be arranged differently. Is this true?

Does that mean it is always better to access the columns by names because the order of columns could be arranged differently for unknown reason and people could obtain different results or even errors when using the index-based access method?


RE: Does the order of columns in the DataFrame matter? - scidam - Feb-14-2020

(Feb-13-2020, 05:12 PM)new_to_python Wrote: As indicated above, on my machine the columns are listed as key1, key2 and data which seems to be according to the order I entered the columns in the pd.DataFrame command. However, the person who made this example has the columns displayed as data followed by key1 and key2 using the same command. How come? I don't quite remember well but I think somebody mentioned that depending on the version python is used, the columns could be arranged differently. Is this true?

This depends on implementation of dict data structure in Python. Prior to Python 3.6 dict structure was unordered "(key, val)" structure.
So, when iterating over dict you can theoretically get different order of items (at least, for different Python versions, implementations), and, therefore, this lead to different order of columns in Pandas dataframe. However, since CPython 3.6+ (or Python 3.7+ for any other implementation of Python), dict preserves the order of item insertion.

In general, to be sure the order of columns is correct, you can always do:
df = df.loc[:, ['col_1', 'col_2', 'col_3']]
After that, you can rely on your particular order of columns and access them by integer incidences.


RE: Does the order of columns in the DataFrame matter? - new_to_python - Feb-14-2020

Thank you very much.

By the way, what do you think is the cause of the re-ordering between Ohio and Colorado from Step 235 to 236?

In [234]: df                                                                                              
Out[234]: 
side             left  right
state    number             
Ohio     one        0      5
         two        1      6
         three      2      7
Colorado one        3      8
         two        4      9
         three      5     10

In [235]: df.unstack('state')                                                                                                                                                        
Out[235]: 
side   left          right         
state  Ohio Colorado  Ohio Colorado
number                             
one       0        3     5        8
two       1        4     6        9
three     2        5     7       10

In [236]: df.unstack('state').stack('side')                                                               
Out[236]: 
state         Colorado  Ohio
number side                 
one    left          3     0
       right         8     5
two    left          4     1
       right         9     6
three  left          5     2
       right        10     7



RE: Does the order of columns in the DataFrame matter? - scidam - Feb-14-2020

I cannot answer exactly, but I think this is because some sorting operation is applied to the index used in .unstack or .stack.
If you look at _Unstacker implementation, you can find that it includes some sorting operations are being applied in different places of the code.
This is likely the cause of the reordering.


RE: Does the order of columns in the DataFrame matter? - new_to_python - Feb-14-2020

Thanks scidam. In this case, what is the best way to change the order back to the original? So as long as I refer the columns by names (preferred method) or in the case of index based method use something like:
df = df.loc[:, ['col_1', 'col_2', 'col_3']]
, I will not need to worry about python doing strange things automatically and unexpectedly behind my back?


RE: Does the order of columns in the DataFrame matter? - scidam - Feb-14-2020

(Feb-14-2020, 02:21 PM)new_to_python Wrote: I will not need to worry about python doing strange things automatically and unexpectedly behind my back?
I don't think you should consider these as something mysterious Python behavior; dictionaries are always considered as unordered key,value containers, so you cannot rely on item order in dictionaries prior v.3.7; In case of stack and unstack operations, it is concerned Pandas and how these methods implemented. As you noted above, if you really need it, you can always reorder columns manually.


RE: Does the order of columns in the DataFrame matter? - new_to_python - Feb-15-2020

Thanks. I am trying to reorder the columns of DataFrame in line[236] of Post #5 manually. I did:

A = df.unstack('state').stack('side') 
A = A[['Ohio', 'Colorado']]
But I got ['Colorado'] not in index error. Could you please tell me how to fix it?


RE: Does the order of columns in the DataFrame matter? - scidam - Feb-15-2020

(Feb-15-2020, 02:05 PM)new_to_python Wrote: But I got ['Colorado'] not in index error. Could you please tell me how to fix it?
A has a multiindex for columns, so you need something like this:

A_new = A.reindex(['Ohio', 'Colorado'], axis=1, levels=1)