Python Forum

English is not my native language; please excuse my typing errors.

Thus, this is my first post. I hope I will get more precise on my questions soon.

I am manipulating Excel sheets using pandas, and the problem is that I noticed that my code changes the data types of the data frames by itself. I specified the reading format of the Excel sheet with:

import pandas as pd

form = {"DayofMonth": float, "Tail_Number": str, "CRSDepTime": float, "DepDelay": float, "ArrTime": float}

arr = pd.read_excel(ad_arr, dtype = form, engine='openpyxl', usecols=['DayofMonth', 'Tail_Number', 'CRSDepTime', 'DepDelay', 'ArrTime'])

So when I use print(arr.dtypes) the result is

Output:DayofMonth     float64
Tail_Number     object
CRSDepTime     float64
DepDelay       float64
ArrTime        float64
ID              object
dtype: object

However, the code uses a command similar to this one:

line_arr = arr.loc[0].to_frame().T

Surprisingly, when I use print(line_arr.dtypes) the result is

Output:DayofMonth     object
Tail_Number    object
CRSDepTime     object
DepDelay       object
ArrTime        object
ID             object
dtype: object

This causes trouble later when I compare data frames to check if they are identical. One solution that I tried was forcing the data types back to what it should be, with:

dtypes = tot_tail.dtypes
line_arr = line_arr.astype(dtypes)

Is there any defensive programming strategy I could use to avoid this problem? I don't want to force data types back over and over again.

loc[0] contains Tail_Number, which is not a float. A numpy array must be homogeneous, so all types are converted to object when you create the series. There are ways around this as discussed here:

https://stackoverflow.com/questions/6264...-dataframe

Maybe the slice approach will work for you as it sounds like you are reassembling the rows back into dataframes.

arr_slice = arr.loc[0:0]
reassembled_df = pd.concat([list of arr slices])

(Oct-30-2023, 04:49 PM)deanhystad Wrote: [ -> ]loc[0] contains Tail_Number, which is not a float. A numpy array must be homogeneous, so all types are converted to object when you create the series. There are ways around this as discussed here:

https://stackoverflow.com/questions/6264...-dataframe

Maybe the slice approach will work for you as it sounds like you are reassembling the rows back into dataframes.
arr_slice = arr.loc[0:0]
reassembled_df = pd.concat([list of arr slices])

It worked, thank you for your help!

dimas

deanhystad

dimas