Python Forum
Arrays faster than pandas?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Arrays faster than pandas?
#1
I just learned that Python is slow with regard to numbers because they are generally wrapped as objects with associated attributes. The memory burden can therefore be 100x what it might take to store the raw number itself. According to the video, NumPy is so fast because it optimizes numerical encoding (e.g. three bits for the number seven as 100 in binary) and because everything is processed as arrays (and therefore stored in contiguous memory locations).

Assuming this is all correct, I wonder: are pandas dataframes stored as arrays in the same manner? If not, then would it be faster to try and store the data in arrays and process them through numpy rather than in dataframes processed through pandas?

Thanks!
Reply
#2
Numpy is generally faster than Pandas, though for me it is easier to develop in Pandas and the speed difference is not noticeable with my relatively smaller datasets.

Speed test article
Mark17 likes this post
Reply
#3
(Jul-27-2021, 04:01 PM)jefsummers Wrote: Numpy is generally faster than Pandas, though for me it is easier to develop in Pandas and the speed difference is not noticeable with my relatively smaller datasets.

Speed test article

Suppose my data is 4000 rows by 15 columns. It would be easier (since I'm a beginner) for me to use a dataframe because then I could use labels that make sense to me. I could, however, load all that into an array. Should I expect to see a noticeable difference between the two or is that not a question I can really ask because it depends on the particularities of my machine?
Reply
#4
Depends on the data, the machine, and what you are actually doing. That said, I'd be surprised if the difference was more than a few seconds, and the difference in time you put into programming would be much greater. Now, if we are talking about something that will be executed many many times on a production web server, then it may matter.
Mark17 likes this post
Reply
#5
(Jul-31-2021, 09:00 PM)jefsummers Wrote: Depends on the data, the machine, and what you are actually doing. That said, I'd be surprised if the difference was more than a few seconds, and the difference in time you put into programming would be much greater. Now, if we are talking about something that will be executed many many times on a production web server, then it may matter.

Thanks!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Pandas dataframes and numpy arrays bytecrunch 1 1,322 Oct-11-2022, 08:08 PM
Last Post: Larz60+
  comparing floating point arrays to arrays of integers in Numpy amjass12 0 1,625 Jul-26-2021, 11:58 AM
Last Post: amjass12
  Numpy arrays and compatability with Fortran arrays merrittr 0 1,871 Sep-03-2019, 03:54 AM
Last Post: merrittr
  Pandas: faster method to count occurrences frame 0 2,311 May-26-2019, 07:45 PM
Last Post: frame

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020