Arrays faster than pandas?

Mark17 · Jul-26-2021, 06:07 PM

I just learned that Python is slow with regard to numbers because they are generally wrapped as objects with associated attributes. The memory burden can therefore be 100x what it might take to store the raw number itself. According to the video, NumPy is so fast because it optimizes numerical encoding (e.g. three bits for the number seven as 100 in binary) and because everything is processed as arrays (and therefore stored in contiguous memory locations).

Assuming this is all correct, I wonder: are pandas dataframes stored as arrays in the same manner? If not, then would it be faster to try and store the data in arrays and process them through numpy rather than in dataframes processed through pandas?

Thanks!

jefsummers · Jul-27-2021, 04:01 PM

Numpy is generally faster than Pandas, though for me it is easier to develop in Pandas and the speed difference is not noticeable with my relatively smaller datasets.

Speed test article

Mark17 · Jul-31-2021, 12:37 PM

(Jul-27-2021, 04:01 PM)jefsummers Wrote: Numpy is generally faster than Pandas, though for me it is easier to develop in Pandas and the speed difference is not noticeable with my relatively smaller datasets.

Speed test article

Suppose my data is 4000 rows by 15 columns. It would be easier (since I'm a beginner) for me to use a dataframe because then I could use labels that make sense to me. I could, however, load all that into an array. Should I expect to see a noticeable difference between the two or is that not a question I can really ask because it depends on the particularities of my machine?

jefsummers · Jul-31-2021, 09:00 PM

Depends on the data, the machine, and what you are actually doing. That said, I'd be surprised if the difference was more than a few seconds, and the difference in time you put into programming would be much greater. Now, if we are talking about something that will be executed many many times on a production web server, then it may matter.

Mark17 · Aug-02-2021, 03:14 PM

(Jul-31-2021, 09:00 PM)jefsummers Wrote: Depends on the data, the machine, and what you are actually doing. That said, I'd be surprised if the difference was more than a few seconds, and the difference in time you put into programming would be much greater. Now, if we are talking about something that will be executed many many times on a production web server, then it may matter.

Thanks!

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Pandas dataframes and numpy arrays	bytecrunch	1	2,216	Oct-11-2022, 08:08 PM Last Post: Larz60+
	comparing floating point arrays to arrays of integers in Numpy	amjass12	0	2,290	Jul-26-2021, 11:58 AM Last Post: amjass12
	Numpy arrays and compatability with Fortran arrays	merrittr	0	2,528	Sep-03-2019, 03:54 AM Last Post: merrittr
	Pandas: faster method to count occurrences	frame	0	2,877	May-26-2019, 07:45 PM Last Post: frame

Arrays faster than pandas?

User Panel Messages

Announcements