Python Forum
Fastest Method for Querying SQL Server with Python Pandas
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Fastest Method for Querying SQL Server with Python Pandas
#1
I am querying a sql table with about 3.5 MM rows and 400 columns - I am estimating it is less than 5 GB in size.

If I use pd.read_sql to select * from this table, it takes many, many hours to execute. (Over 10 hours sometimes....)

I would like to read in the entire table and not cut it into pieces....I know folks will suggest otherwise, I get that. But I am still looking for a better solution that can handle reading in the entire table. We are not talking about terabytes here...

Also, even if I use chunks, the sum of the time the individual chunks take to process is similar amount of time to execute the entire table query.

Is there any alternative approach that would work faster?

Thanks,

Brian
Reply
#2
Have you tried to read the table through the DB-API instead of pandas for comparison?
Reply
#3
(Jul-31-2020, 08:27 AM)Gribouillis Wrote: Have you tried to read the table through the DB-API instead of pandas for comparison?

I haven't. I will see what I can find based on that suggestion. Are you expecting it to be faster?

Thanks!
Reply
#4
Depends.
What SQL server are you using? Is it on the same machine or across the network?

By offloading some of the work and storage to the server, you can see a performance boost.
Reply
#5
BuJayBelvin Wrote:Are you expecting it to be faster?
Yes. In fact I'm a complete begginner in pandas but I read that pandas uses sqlalchemy and I have some bad memories of sqlalchemy being much slower than the DB-API in certain situations. It could really change something.
Reply
#6
(Jul-31-2020, 06:00 PM)Gribouillis Wrote:
BuJayBelvin Wrote:Are you expecting it to be faster?
Yes. In fact I'm a complete begginner in pandas but I read that pandas uses sqlalchemy and I have some bad memories of sqlalchemy being much slower than the DB-API in certain situations. It could really change something.

Thanks! That's good to know. I am limited to SQL Server 2005 (which is insane) and I am querying the remove server from home. I assume the connection causes a portion of the delay but I think there may be more efficient solutions other than read_sql on pandas....hence, the inquiry. I need to look into the DB-API recommendation....

Thanks!
Reply
#7
You could perhaps use something such as pyodbc, there is an example here, even with a pandas dataframe. Again, I'm only guessing, but pyodbc implements the db-api.
Reply
#8
Different database than what you are using, but HEREis a performance comparison. Pandas is convenient but slow for such things.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Fastest way tkinter Quatrixouuu 2 406 Feb-19-2024, 07:20 AM
Last Post: Danishhafeez
  What is the fastest way to get all the frames from a video file? glorsh66 3 1,079 May-26-2023, 04:41 AM
Last Post: Gribouillis
  Method works as expected on host machine but not on server gradlon93 4 1,080 Jan-05-2023, 10:41 AM
Last Post: DeaD_EyE
  Basic SQL query using Py: Inserting or querying sqlite3 database not returning data marlonbown 3 1,373 Nov-08-2022, 07:16 PM
Last Post: marlonbown
  Fastest Way of Writing/Reading Data JamesA 1 2,193 Jul-27-2021, 03:52 PM
Last Post: Larz60+
  How to take the tar backup files form remote server to local server sivareddy 0 1,900 Jul-14-2021, 01:32 PM
Last Post: sivareddy
  Recommendations: Querying CSV / Database?? ironfelix717 2 1,657 Apr-01-2020, 12:13 PM
Last Post: ndc85430
  Passing parameters to SQL Server using pandas Ilangos 8 16,484 Jun-03-2019, 11:25 AM
Last Post: Ilangos
  Fastest dict/map method when 'key' is already a hash? tasket 6 3,993 Apr-20-2019, 06:40 PM
Last Post: tasket
  fastest way to record values between quotes paul18fr 5 3,296 Apr-15-2019, 01:51 PM
Last Post: snippsat

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020