Python Forum
python multiprocessing to download sql table
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
python multiprocessing to download sql table
#1
Hi Team,

is it possible to use multiprocessing in below scenario.
downloading sql table into csv files.
to speed up downloading.


Import csv
Quary = "select * from UK_Table"
x = 0

With open("E:\\backup\\output.csv","w","newline="") as outfile
	writer = csv.writer(outfile,quoting = csv.QUOTE_NONNUMERIC)
	writer.writerows(col[0] for col in cursor.description)
	While True:
		rows = cursor.fetchmany(10000)
		if len(rows) ==0:
			print("no records found")
			break
		else:
	          x = x+len(rows)
               print(x)
               for row in rows:
                  writer.writerows(row)


conn.close()
print("success")
print('time taken' . time.time()-initial,"Seconds")



1)  Example of multiprocessing.

import multiprocessing
import os
def square(n):
    print("Worker process id for {0}:{1}".format(n,os.getpid()))
    return(n*n)

if __name__=="__main__":
    #input list
    arr = [1, 2, 3, 4, 5]

    #creating a pool object
    p = multiprocessing.Pool()

    #map list to target function
    result = p.map(square,arr)

    print("Square of each elements:")
    print(result)
Reply
#2
FYI: Possible alternative solution:

You might want to try pandas with sqlalchemy.
I have found this method to be very fast, and multiprocessing may not be necessary

The code is written as an example only, I don't expect that it will run without adding necessary schema, etc.
I have left notes where modifications are required.
The amount of code needed is very small.

It's easiest to create the ORM schema by first dumping your DBMS schema as a guide.
For example, In sqlite, this can be done from command line using sqlite3 as follows:
$ sqlite3
sqlite> .schema AD # your tablename here
with output similar to:
Output:
CREATE TABLE IF NOT EXISTS "AD" ( "RecordType" VARCHAR NOT NULL, "UniqueSystemIdentifier" VARCHAR NOT NULL, "ULSFileNumber" VARCHAR, "EBF_Number" VARCHAR, "ApplicationPurpose" VARCHAR, "ApplicationStatus" VARCHAR, "ApplicationFeeExempt" VARCHAR, "RegulatoryFeeExempt" VARCHAR,
Other DBMS will have similar utility.

Note: Code below will not run without modification
Example code:
from pathlib import Path
import pandas as pd
from sqlalchemy.ext.declarative import declarative_base
# you can add or remove datatypes from import as needed
from sqlalchemy import (Column, String, Integer, Date, create_engine,
    DateTime, ForeignKey)

# replace actual path to database, and DBMS type in engine statement below
engine = create_engine(f"sqlite:///{path_to_database}")
Base = declarative_base()

# You will need to replace schema below eith actual schema for UK_Table
# keep lines 14 and 16 
# I added the following as an example only
class UK_Table(Base):

    __tablename__ = 'UK_Table'

    # --------------------------------------------------------
    # create schema here -- replace example with your schema
    # Following is only an example schema
    # --------------------------------------------------------

    RecordType = Column(String, primary_key=True, nullable=False)
    UniqueSystemIdentifier = Column(String, primary_key=True, nullable=False)
    ULSFileNumber = Column(String, nullable=True)
    EBF_Number = Column(String, index=True, nullable=True)
    ApplicationPurpose = Column(String, index=True, nullable=True)
    ApplicationStatus = Column(String, index=True, nullable=True)
    ApplicationFeeExempt = Column(String, nullable=True)
    RegulatoryFeeExempt = Column(String, nullable=True)


class CsvFromSQL:
    def __init__(self):
        self.csvfile = Path("E:\\backup\\output.csv")
    
    def save_table_to_csv:
        df = read_sql(__tablename__, con=self.UK_Table.engine)
        # Note: see https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html
        # for csv write options.
        df.to_csv(self.cavfile)


def main():
    cfs = CsvFromSQL()
    cfs.save_table_to_csv()


if __name__ == '__main__':
    main()
Reply
#3
from concurrent.futures import ProcessPoolExecutor

with ProcessPoolExecutor() as executor:

    results = executor.map(square, arr)
"As they say in Mexico 'dosvidaniya'. That makes two vidaniyas."
https://freedns.afraid.org
Reply
#4
Hi Larz60,

thanks for your help.

SQL Table data size is 100gb after saving into csv.

can pandas handle big sql table.

and also I have 350 columns in a table.

Can I go with above approach for big table.


Thanks
mg
Reply
#5
Mulit-processing doesn't speed things up unless they are processor bound (doing lots of calculatons). Querying a database is not a processor bound application. You also need to come up with a way to make queries so each process can retrieve some well defined part of the database, and have a way to stitch the results together. Asking for every row in the one table in your database is not going to work.

That said, you should try it to see what really happens.
Reply
#6
You may be paging the data if it's that large, also I'm not sure if you can process in chunks using SQLalchemy I'll see if I can find anything in the docs.

Do you notice an inordnate amount of disk activity shortly after starting (assuming you are using a mechanical storage device)?
if so, perhaps 10,000 records is too large.

Python.org has a pretty good multiprocessing tutorial here: https://docs.python.org/3/library/multiprocessing.html
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Is this a multiprocessing bug in Python - or am I doing something wrong? haimat 1 1,200 Oct-18-2023, 06:07 AM
Last Post: absalom1
  python multiprocessing help -- to extract 10 sql table into csv mg24 3 1,406 Nov-20-2022, 11:50 PM
Last Post: mg24
  PyRun_SimpleFile calling multiprocessing Python Class cause endless init loop Xeno 2 1,056 Sep-19-2022, 02:32 AM
Last Post: Xeno
  Python multiprocessing Pool apply async wait for process to complete sunny9495 6 6,465 Apr-02-2022, 06:31 AM
Last Post: sunny9495
  download with internet download manager coral_raha 0 2,967 Jul-18-2021, 03:11 PM
Last Post: coral_raha
  download pubmed PDFs using pubmed2pdf in python Wooki 8 5,525 Oct-19-2020, 03:06 PM
Last Post: jefsummers
  How can I download Python files from GitHub? bitcoin10mil 2 2,852 Aug-26-2020, 09:03 PM
Last Post: Axel_Erfurt
  python multiprocessing import Pool, cpu_count: causes forever loop | help to remove Hassibayub 0 1,873 Jun-18-2020, 05:27 PM
Last Post: Hassibayub
Big Grin python download manager with progressbar (not gui) ghostblade 1 1,937 Apr-23-2020, 11:05 AM
Last Post: snippsat
  Python Download GillietheSquid 2 2,043 Mar-27-2020, 09:15 PM
Last Post: GillietheSquid

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020