Python Forum

Full Version: Arrays faster than pandas?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I just learned that Python is slow with regard to numbers because they are generally wrapped as objects with associated attributes. The memory burden can therefore be 100x what it might take to store the raw number itself. According to the video, NumPy is so fast because it optimizes numerical encoding (e.g. three bits for the number seven as 100 in binary) and because everything is processed as arrays (and therefore stored in contiguous memory locations).

Assuming this is all correct, I wonder: are pandas dataframes stored as arrays in the same manner? If not, then would it be faster to try and store the data in arrays and process them through numpy rather than in dataframes processed through pandas?

Thanks!
Numpy is generally faster than Pandas, though for me it is easier to develop in Pandas and the speed difference is not noticeable with my relatively smaller datasets.

Speed test article
(Jul-27-2021, 04:01 PM)jefsummers Wrote: [ -> ]Numpy is generally faster than Pandas, though for me it is easier to develop in Pandas and the speed difference is not noticeable with my relatively smaller datasets.

Speed test article

Suppose my data is 4000 rows by 15 columns. It would be easier (since I'm a beginner) for me to use a dataframe because then I could use labels that make sense to me. I could, however, load all that into an array. Should I expect to see a noticeable difference between the two or is that not a question I can really ask because it depends on the particularities of my machine?
Depends on the data, the machine, and what you are actually doing. That said, I'd be surprised if the difference was more than a few seconds, and the difference in time you put into programming would be much greater. Now, if we are talking about something that will be executed many many times on a production web server, then it may matter.
(Jul-31-2021, 09:00 PM)jefsummers Wrote: [ -> ]Depends on the data, the machine, and what you are actually doing. That said, I'd be surprised if the difference was more than a few seconds, and the difference in time you put into programming would be much greater. Now, if we are talking about something that will be executed many many times on a production web server, then it may matter.

Thanks!