Does the order of columns in the DataFrame matter? - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: Does the order of columns in the DataFrame matter? (/thread-24407.html) Pages:
1
2
|
Does the order of columns in the DataFrame matter? - new_to_python - Feb-12-2020 Hi, I noticed that sometimes when doing certain operations, the order of the columns in the DataFrame got changed automatically. Same thing happened when I tried out some examples in books. Using the same commands, my DataFrames have the columns shown in different orders as those shown in the books. Does the order of DataFrame matters in Python/pandas/numpy? RE: Does the order of columns in the DataFrame matter? - scidam - Feb-13-2020 Yes, it does. Look at the following example import numpy as np import pandas as pd df = pd.DataFrame(np.random.rand(10, 5)) print(df.values) # print corresponding numpy array print(df[[2,1,3,4,0]].values) # reorder columns and printThe answer depends on how you are accessing data in data-frame. If you access columns by names, e.g. df.loc[:, 'some_name'] and never use index-based access, e.g. something like df.iloc[:, 4] , you can not worry about the order of columns.
RE: Does the order of columns in the DataFrame matter? - new_to_python - Feb-13-2020 Thanks. I came across the following example: # Example 2 In [194]: lefth = pd.DataFrame({'key1': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'], ...: 'key2': [2000, 2001, 2002, 2001, 2002], ...: 'data': np.arange(5.)}) In [196]: lefth Out[196]: key1 key2 data 0 Ohio 2000 0.0 1 Ohio 2001 1.0 2 Ohio 2002 2.0 3 Nevada 2001 3.0 4 Nevada 2002 4.0As indicated above, on my machine the columns are listed as key1, key2 and data which seems to be according to the order I entered the columns in the pd.DataFrame command. However, the person who made this example has the columns displayed as data followed by key1 and key2 using the same command. How come? I don't quite remember well but I think somebody mentioned that depending on the version python is used, the columns could be arranged differently. Is this true? Does that mean it is always better to access the columns by names because the order of columns could be arranged differently for unknown reason and people could obtain different results or even errors when using the index-based access method? RE: Does the order of columns in the DataFrame matter? - scidam - Feb-14-2020 (Feb-13-2020, 05:12 PM)new_to_python Wrote: As indicated above, on my machine the columns are listed as key1, key2 and data which seems to be according to the order I entered the columns in the pd.DataFrame command. However, the person who made this example has the columns displayed as data followed by key1 and key2 using the same command. How come? I don't quite remember well but I think somebody mentioned that depending on the version python is used, the columns could be arranged differently. Is this true? This depends on implementation of dict data structure in Python. Prior to Python 3.6 dict structure was unordered "(key, val)" structure. So, when iterating over dict you can theoretically get different order of items (at least, for different Python versions, implementations), and, therefore, this lead to different order of columns in Pandas dataframe. However, since CPython 3.6+ (or Python 3.7+ for any other implementation of Python), dict preserves the order of item insertion. In general, to be sure the order of columns is correct, you can always do: df = df.loc[:, ['col_1', 'col_2', 'col_3']]After that, you can rely on your particular order of columns and access them by integer incidences. RE: Does the order of columns in the DataFrame matter? - new_to_python - Feb-14-2020 Thank you very much. By the way, what do you think is the cause of the re-ordering between Ohio and Colorado from Step 235 to 236? In [234]: df Out[234]: side left right state number Ohio one 0 5 two 1 6 three 2 7 Colorado one 3 8 two 4 9 three 5 10 In [235]: df.unstack('state') Out[235]: side left right state Ohio Colorado Ohio Colorado number one 0 3 5 8 two 1 4 6 9 three 2 5 7 10 In [236]: df.unstack('state').stack('side') Out[236]: state Colorado Ohio number side one left 3 0 right 8 5 two left 4 1 right 9 6 three left 5 2 right 10 7 RE: Does the order of columns in the DataFrame matter? - scidam - Feb-14-2020 I cannot answer exactly, but I think this is because some sorting operation is applied to the index used in .unstack or .stack . If you look at _Unstacker implementation, you can find that it includes some sorting operations are being applied in different places of the code. This is likely the cause of the reordering. RE: Does the order of columns in the DataFrame matter? - new_to_python - Feb-14-2020 Thanks scidam. In this case, what is the best way to change the order back to the original? So as long as I refer the columns by names (preferred method) or in the case of index based method use something like: df = df.loc[:, ['col_1', 'col_2', 'col_3']], I will not need to worry about python doing strange things automatically and unexpectedly behind my back? RE: Does the order of columns in the DataFrame matter? - scidam - Feb-14-2020 (Feb-14-2020, 02:21 PM)new_to_python Wrote: I will not need to worry about python doing strange things automatically and unexpectedly behind my back?I don't think you should consider these as something mysterious Python behavior; dictionaries are always considered as unordered key,value containers, so you cannot rely on item order in dictionaries prior v.3.7; In case of stack and unstack operations, it is concerned Pandas and how these methods implemented. As you noted above, if you really need it, you can always reorder columns manually.
RE: Does the order of columns in the DataFrame matter? - new_to_python - Feb-15-2020 Thanks. I am trying to reorder the columns of DataFrame in line[236] of Post #5 manually. I did: A = df.unstack('state').stack('side') A = A[['Ohio', 'Colorado']]But I got ['Colorado'] not in index error. Could you please tell me how to fix it? RE: Does the order of columns in the DataFrame matter? - scidam - Feb-15-2020 (Feb-15-2020, 02:05 PM)new_to_python Wrote: But I got ['Colorado'] not in index error. Could you please tell me how to fix it?A has a multiindex for columns, so you need something like this: A_new = A.reindex(['Ohio', 'Colorado'], axis=1, levels=1) |