Python Forum
Data types changing by itself
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Data types changing by itself
#1
English is not my native language; please excuse my typing errors.

Thus, this is my first post. I hope I will get more precise on my questions soon.

I am manipulating Excel sheets using pandas, and the problem is that I noticed that my code changes the data types of the data frames by itself. I specified the reading format of the Excel sheet with:

import pandas as pd

form = {"DayofMonth": float, "Tail_Number": str, "CRSDepTime": float, "DepDelay": float, "ArrTime": float}

arr = pd.read_excel(ad_arr, dtype = form, engine='openpyxl', usecols=['DayofMonth', 'Tail_Number', 'CRSDepTime', 'DepDelay', 'ArrTime'])
So when I use print(arr.dtypes) the result is

Output:
DayofMonth float64 Tail_Number object CRSDepTime float64 DepDelay float64 ArrTime float64 ID object dtype: object
However, the code uses a command similar to this one:

line_arr = arr.loc[0].to_frame().T

Surprisingly, when I use print(line_arr.dtypes) the result is

Output:
DayofMonth object Tail_Number object CRSDepTime object DepDelay object ArrTime object ID object dtype: object
This causes trouble later when I compare data frames to check if they are identical. One solution that I tried was forcing the data types back to what it should be, with:

dtypes = tot_tail.dtypes
line_arr = line_arr.astype(dtypes)
Is there any defensive programming strategy I could use to avoid this problem? I don't want to force data types back over and over again.
Larz60+ write Oct-30-2023, 02:47 PM:
Please post all code, output and errors (it it's entirety) between their respective tags. Refer to BBCode help topic on how to post. Use the "Preview Post" button to make sure the code is presented as you expect before hitting the "Post Reply/Thread" button.
Modified for you this time. Please use BBCode tags on future posts.
Reply
#2
loc[0] contains Tail_Number, which is not a float. A numpy array must be homogeneous, so all types are converted to object when you create the series. There are ways around this as discussed here:

https://stackoverflow.com/questions/6264...-dataframe

Maybe the slice approach will work for you as it sounds like you are reassembling the rows back into dataframes.
arr_slice = arr.loc[0:0]
reassembled_df = pd.concat([list of arr slices])
dimas likes this post
Reply
#3
(Oct-30-2023, 04:49 PM)deanhystad Wrote: loc[0] contains Tail_Number, which is not a float. A numpy array must be homogeneous, so all types are converted to object when you create the series. There are ways around this as discussed here:

https://stackoverflow.com/questions/6264...-dataframe

Maybe the slice approach will work for you as it sounds like you are reassembling the rows back into dataframes.
arr_slice = arr.loc[0:0]
reassembled_df = pd.concat([list of arr slices])

It worked, thank you for your help!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Training a model to identify specific SMS types and extract relevant data? lord_of_cinder 0 1,000 Oct-10-2022, 04:35 AM
Last Post: lord_of_cinder
  Can the data types be different for different columns? Robotguy 2 2,135 Aug-19-2020, 09:24 PM
Last Post: Robotguy
  Changing Function by Changing or without Changing its Parameters usmankhan 5 3,779 Jan-09-2018, 03:52 PM
Last Post: Windspar

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020