Python Forum

Full Version: Data types changing by itself
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
English is not my native language; please excuse my typing errors.

Thus, this is my first post. I hope I will get more precise on my questions soon.

I am manipulating Excel sheets using pandas, and the problem is that I noticed that my code changes the data types of the data frames by itself. I specified the reading format of the Excel sheet with:

import pandas as pd

form = {"DayofMonth": float, "Tail_Number": str, "CRSDepTime": float, "DepDelay": float, "ArrTime": float}

arr = pd.read_excel(ad_arr, dtype = form, engine='openpyxl', usecols=['DayofMonth', 'Tail_Number', 'CRSDepTime', 'DepDelay', 'ArrTime'])
So when I use print(arr.dtypes) the result is

Output:
DayofMonth float64 Tail_Number object CRSDepTime float64 DepDelay float64 ArrTime float64 ID object dtype: object
However, the code uses a command similar to this one:

line_arr = arr.loc[0].to_frame().T

Surprisingly, when I use print(line_arr.dtypes) the result is

Output:
DayofMonth object Tail_Number object CRSDepTime object DepDelay object ArrTime object ID object dtype: object
This causes trouble later when I compare data frames to check if they are identical. One solution that I tried was forcing the data types back to what it should be, with:

dtypes = tot_tail.dtypes
line_arr = line_arr.astype(dtypes)
Is there any defensive programming strategy I could use to avoid this problem? I don't want to force data types back over and over again.
loc[0] contains Tail_Number, which is not a float. A numpy array must be homogeneous, so all types are converted to object when you create the series. There are ways around this as discussed here:

https://stackoverflow.com/questions/6264...-dataframe

Maybe the slice approach will work for you as it sounds like you are reassembling the rows back into dataframes.
arr_slice = arr.loc[0:0]
reassembled_df = pd.concat([list of arr slices])
(Oct-30-2023, 04:49 PM)deanhystad Wrote: [ -> ]loc[0] contains Tail_Number, which is not a float. A numpy array must be homogeneous, so all types are converted to object when you create the series. There are ways around this as discussed here:

https://stackoverflow.com/questions/6264...-dataframe

Maybe the slice approach will work for you as it sounds like you are reassembling the rows back into dataframes.
arr_slice = arr.loc[0:0]
reassembled_df = pd.concat([list of arr slices])

It worked, thank you for your help!