Jul-26-2021, 06:07 PM
I just learned that Python is slow with regard to numbers because they are generally wrapped as objects with associated attributes. The memory burden can therefore be 100x what it might take to store the raw number itself. According to the video, NumPy is so fast because it optimizes numerical encoding (e.g. three bits for the number seven as 100 in binary) and because everything is processed as arrays (and therefore stored in contiguous memory locations).
Assuming this is all correct, I wonder: are pandas dataframes stored as arrays in the same manner? If not, then would it be faster to try and store the data in arrays and process them through numpy rather than in dataframes processed through pandas?
Thanks!
Assuming this is all correct, I wonder: are pandas dataframes stored as arrays in the same manner? If not, then would it be faster to try and store the data in arrays and process them through numpy rather than in dataframes processed through pandas?
Thanks!