Data types changing by itself

dimas · (This post was last modified: Oct-30-2023, 02:47 PM by Larz60+.)

English is not my native language; please excuse my typing errors.

Thus, this is my first post. I hope I will get more precise on my questions soon.

I am manipulating Excel sheets using pandas, and the problem is that I noticed that my code changes the data types of the data frames by itself. I specified the reading format of the Excel sheet with:

import pandas as pd

form = {"DayofMonth": float, "Tail_Number": str, "CRSDepTime": float, "DepDelay": float, "ArrTime": float}

arr = pd.read_excel(ad_arr, dtype = form, engine='openpyxl', usecols=['DayofMonth', 'Tail_Number', 'CRSDepTime', 'DepDelay', 'ArrTime'])

So when I use print(arr.dtypes) the result is

Output:DayofMonth     float64
Tail_Number     object
CRSDepTime     float64
DepDelay       float64
ArrTime        float64
ID              object
dtype: object

However, the code uses a command similar to this one:

line_arr = arr.loc[0].to_frame().T

Surprisingly, when I use print(line_arr.dtypes) the result is

Output:DayofMonth     object
Tail_Number    object
CRSDepTime     object
DepDelay       object
ArrTime        object
ID             object
dtype: object

This causes trouble later when I compare data frames to check if they are identical. One solution that I tried was forcing the data types back to what it should be, with:

dtypes = tot_tail.dtypes
line_arr = line_arr.astype(dtypes)

Is there any defensive programming strategy I could use to avoid this problem? I don't want to force data types back over and over again.

Larz60+ write Oct-30-2023, 02:47 PM:
Please post all code, output and errors (it it's entirety) between their respective tags. Refer to BBCode help topic on how to post. Use the "Preview Post" button to make sure the code is presented as you expect before hitting the "Post Reply/Thread" button.
Modified for you this time. Please use BBCode tags on future posts.

**deanhystad** · (This post was last modified: Oct-30-2023, 04:49 PM by deanhystad.)

loc[0] contains Tail_Number, which is not a float. A numpy array must be homogeneous, so all types are converted to object when you create the series. There are ways around this as discussed here:

https://stackoverflow.com/questions/6264...-dataframe

Maybe the slice approach will work for you as it sounds like you are reassembling the rows back into dataframes.

arr_slice = arr.loc[0:0]
reassembled_df = pd.concat([list of arr slices])

dimas · Oct-30-2023, 06:11 PM

(Oct-30-2023, 04:49 PM)deanhystad Wrote: loc[0] contains Tail_Number, which is not a float. A numpy array must be homogeneous, so all types are converted to object when you create the series. There are ways around this as discussed here:

https://stackoverflow.com/questions/6264...-dataframe

Maybe the slice approach will work for you as it sounds like you are reassembling the rows back into dataframes.
arr_slice = arr.loc[0:0]
reassembled_df = pd.concat([list of arr slices])

It worked, thank you for your help!

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Training a model to identify specific SMS types and extract relevant data?	lord_of_cinder	0	1,622	Oct-10-2022, 04:35 AM Last Post: lord_of_cinder
	Can the data types be different for different columns?	Robotguy	2	2,882	Aug-19-2020, 09:24 PM Last Post: Robotguy
	Changing Function by Changing or without Changing its Parameters	usmankhan	5	4,923	Jan-09-2018, 03:52 PM Last Post: Windspar

Data types changing by itself

User Panel Messages

Announcements