Fastest Method for Querying SQL Server with Python Pandas

BuJayBelvin · Jul-30-2020, 04:02 PM

I am querying a sql table with about 3.5 MM rows and 400 columns - I am estimating it is less than 5 GB in size.

If I use pd.read_sql to select * from this table, it takes many, many hours to execute. (Over 10 hours sometimes....)

I would like to read in the entire table and not cut it into pieces....I know folks will suggest otherwise, I get that. But I am still looking for a better solution that can handle reading in the entire table. We are not talking about terabytes here...

Also, even if I use chunks, the sum of the time the individual chunks take to process is similar amount of time to execute the entire table query.

Is there any alternative approach that would work faster?

Thanks,

Brian

**Gribouillis** · (This post was last modified: Jul-31-2020, 08:57 AM by Gribouillis.)

Have you tried to read the table through the DB-API instead of pandas for comparison?

BuJayBelvin · Jul-31-2020, 04:57 PM

(Jul-31-2020, 08:27 AM)Gribouillis Wrote: Have you tried to read the table through the DB-API instead of pandas for comparison?

I haven't. I will see what I can find based on that suggestion. Are you expecting it to be faster?

Thanks!

jefsummers · Jul-31-2020, 05:49 PM

Depends.
What SQL server are you using? Is it on the same machine or across the network?

By offloading some of the work and storage to the server, you can see a performance boost.

**Gribouillis** · (This post was last modified: Jul-31-2020, 06:01 PM by Gribouillis.)

BuJayBelvin Wrote:Are you expecting it to be faster?

Yes. In fact I'm a complete begginner in pandas but I read that pandas uses sqlalchemy and I have some bad memories of sqlalchemy being much slower than the DB-API in certain situations. It could really change something.

BuJayBelvin · Aug-01-2020, 02:06 PM

(Jul-31-2020, 06:00 PM)Gribouillis Wrote:
BuJayBelvin Wrote:Are you expecting it to be faster?
Yes. In fact I'm a complete begginner in pandas but I read that pandas uses sqlalchemy and I have some bad memories of sqlalchemy being much slower than the DB-API in certain situations. It could really change something.

Thanks! That's good to know. I am limited to SQL Server 2005 (which is insane) and I am querying the remove server from home. I assume the connection causes a portion of the delay but I think there may be more efficient solutions other than read_sql on pandas....hence, the inquiry. I need to look into the DB-API recommendation....

Thanks!

**Gribouillis** · (This post was last modified: Aug-01-2020, 03:35 PM by Gribouillis.)

You could perhaps use something such as pyodbc, there is an example here, even with a pandas dataframe. Again, I'm only guessing, but pyodbc implements the db-api.

jefsummers · Aug-02-2020, 06:21 PM

Different database than what you are using, but HEREis a performance comparison. Pandas is convenient but slow for such things.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Fastest way tkinter	Quatrixouuu	2	1,225	Feb-19-2024, 07:20 AM Last Post: Danishhafeez
	What is the fastest way to get all the frames from a video file?	glorsh66	3	4,875	May-26-2023, 04:41 AM Last Post: Gribouillis
	Method works as expected on host machine but not on server	gradlon93	4	2,027	Jan-05-2023, 10:41 AM Last Post: DeaD_EyE
	Basic SQL query using Py: Inserting or querying sqlite3 database not returning data	marlonbown	3	3,160	Nov-08-2022, 07:16 PM Last Post: marlonbown
	Fastest Way of Writing/Reading Data	JamesA	1	2,969	Jul-27-2021, 03:52 PM Last Post: Larz60+
	How to take the tar backup files form remote server to local server	sivareddy	0	2,609	Jul-14-2021, 01:32 PM Last Post: sivareddy
	Recommendations: Querying CSV / Database??	ironfelix717	2	2,249	Apr-01-2020, 12:13 PM Last Post: ndc85430
	Passing parameters to SQL Server using pandas	Ilangos	8	19,462	Jun-03-2019, 11:25 AM Last Post: Ilangos
	Fastest dict/map method when 'key' is already a hash?	tasket	6	5,285	Apr-20-2019, 06:40 PM Last Post: tasket
	fastest way to record values between quotes	paul18fr	5	4,300	Apr-15-2019, 01:51 PM Last Post: snippsat

Fastest Method for Querying SQL Server with Python Pandas

User Panel Messages

Announcements