Create multiple/single csv file for each sql records

mg24 · (This post was last modified: Sep-29-2022, 06:48 AM by buran.)

Hi Team,

I am creating multiple csv files, one csv for one records.
with header. code is working as expected.

I want to avoid fetch all , as office SQL Data is big 60gb . memory issue might come.

how to optimize my below code. multiple times I am using this.
writer = csv.writer(outfile,delimiter="|",quoting=csv.QUOTE_NONNUMERIC)
writer.writerow(col[0] for col in cursor.description)

import pyodbc
import csv
import os

connection = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER=DESKTOP-GQK64O6;DATABASE=Customer;Trusted_Connection=yes;')
cursor = connection.cursor()

qry = "Select * from employee"
cursor.execute(qry)

data = cursor.fetchall()
print(len(data))



folderPath = "C:\\Users\\xyz\\OneDrive\\Desktop\\C\\test_data\\output"
count = 0
# for reply in cursor.execute(qry):
for x in data:
    count = count + 1
    print(count)
    fname = "Row"+str(count)+".csv"
    fullpath = os.path.join(folderPath,fname)
    print(fullpath)
    with open(fullpath,"w",newline="") as outfile:
        writer = csv.writer(outfile,delimiter="|",quoting=csv.QUOTE_NONNUMERIC)
        writer.writerow(col[0] for col in cursor.description)
        writer.writerow(x)
        print(f"I am row {count}",x)
cursor.close()
connection.close()

**buran** · (This post was last modified: Sep-28-2022, 08:22 PM by buran.)

I

(Sep-28-2022, 07:49 PM)mg24 Wrote: I want to avoid fetch all , as office SQL Data is big 60gb . memory issue might come.

There is no need to do cursor.fetchall(). You can iterate over cursor directly.
There is even an example in the docs

Quote:

# standard
cursor.execute("select a from tbl where b=? and c=?", (x, y))

# pyodbc extension
cursor.execute("select a from tbl where b=? and c=?", x, y)

The return value is always the cursor itself:

for row in cursor.execute("select user_id, user_name from users"):
    print(row.user_id, row.user_name)

Then (not tested code):

import pyodbc
import csv
import os
 
connection = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER=DESKTOP-GQK64O6;DATABASE=Customer;Trusted_Connection=yes;')
with connection.cursor() as crsr:
 
    qry = "Select * from employee"
    crsr.execute(qry)
    
    folderPath = "C:\\Users\\xyz\\OneDrive\\Desktop\\C\\test_data\\output"
    header = next(zip(*crsr.description))

    for idx, row in enumerate(crsr, start=1):
        fname = f"Row{idx}.csv"
        fullpath = os.path.join(folderPath,fname)
        print(fullpath)
        with open(fullpath, "w", newline="") as outfile:
            writer = csv.writer(outfile, delimiter="|", quoting=csv.QUOTE_NONNUMERIC)
            writer.writerows([header, row])
            print(f"I am row {idx} --> {row}")

mg24 · Sep-29-2022, 05:14 AM

Hi Buran,

superb ! Thank you so much , I liked your code. Whistle

Similar situation here, Extracting all records into single csv files.
using fetchmany extracted 60gb of data.
my code is working , can we make any improvement in below code plz

import pyodbc
import csv

connection = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER=DESKTOP-GQK64O6;DATABASE=Customer;Trusted_Connection=yes;')
cursor = connection.cursor()


qry = "select * from employee"

cursor.execute(qry) 
data = cursor.fetchall()


With open("E:\\backup\\output.csv","w","newline="") as outfile
	writer = csv.writer(outfile,quoting = csv.QUOTE_NONNUMERIC)
	writer.writerows(col[0] for col in cursor.description)
	While True:
		rows = cursor.fetchmany(10000)
		if len(rows) ==0:
			print("no records found")
			break
		else:
	          x = x+len(rows)
               print(x)
               for row in rows:
                  writer.writerows(row)


conn.close()
print("success")
print('time taken' . time.time()-initial,"Seconds")

mg24 · Sep-29-2022, 05:22 AM

Hi Team,

below code I am using to extract big sql table 60gb.
how to extract sql table to csv using sqlalchemy with headers.

import pyodbc
import csv

connection = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER=DESKTOP-GQK64O6;DATABASE=Customer;Trusted_Connection=yes;')
cursor = connection.cursor()


qry = "select * from employee"

cursor.execute(qry) 
#data = cursor.fetchall()


With open("E:\\backup\\output.csv","w","newline="") as outfile
	writer = csv.writer(outfile,quoting = csv.QUOTE_NONNUMERIC)
	writer.writerows(col[0] for col in cursor.description)
	While True:
		rows = cursor.fetchmany(10000)
		if len(rows) ==0:
			print("no records found")
			break
		else:
	          x = x+len(rows)
               print(x)
               for row in rows:
                  writer.writerows(row)


conn.close()
print("success")
print('time taken' . time.time()-initial,"Seconds")

Below is my attempted code, how to fetch extract SQL Table data with header using pycharm.
without memory full issue.

from sqlalchemy import create_engine
import pandas as pd

DRIVER= "ODBC Driver 17 for SQL Server"
server="DESKTOP-GQK64O6"
DATABASE="Customer"

str_conn = f'mssql://{server}/{DATABASE}?Driver={DRIVER}'
connection = create_engine(str_conn).connect()

mylist = []

#Read in Chunk
for chunk in pd.read_sql_table('employee',connection,chunksize=1000):
	mylist.append(chunk)
df = pd.concat(mylist)
ndf = pd.DataFrame(df)

ndf.to_parquet("C:\\C_Programming\\output.csv")

Thanks
mg

**buran** · (This post was last modified: Sep-29-2022, 06:56 AM by buran.)

I appreciate you like my code, but at the same time it's disappointing that nothing I show in my code found a place in your updated code that extract the whole table in single SQL file - the context manager, not using fetch, etc...- and you have basically the same question. You even introduce pure SyntaxErrors - e.g. capital first char in While, With.... And ask us to make "improvements"
All you had to do is to write to a single file. Open the file before iterating over cursor. In fact, after you write the header, you can use file.writerows(cursor) to write all the data in the file in a single line.

I/we can update the code for you, but this site is focused on education, so better try to implement the changes I show you

mg24 · Sep-29-2022, 07:32 AM

Hi Buran,

I did what you suggested, code is working fine. thanks again.
I have one question ,

1) if there is no record in sql. does it go in infinite loop, we are not using break condition / checking are there any records.
2) is there any chance of memory full issue if (SQL Table is big one)

below is attempted as code. is this correct now.

import pyodbc
import csv
import os

connection = pyodbc.connect(
    'DRIVER={ODBC Driver 17 for SQL Server};SERVER=DESKTOP-GQK64O6;DATABASE=Customer;Trusted_Connection=yes;')
with connection.cursor() as crsr:
    qry = "Select * from employee"
    crsr.execute(qry)

    folderPath = "C:\\Users\\malle\\OneDrive\\Desktop\\C\\test_data\\output"
    header = next(zip(*crsr.description))

    fname = "main.csv"
    fullpath = os.path.join(folderPath, fname)
    with open(fullpath, "w", newline="") as outfile:
        writer = csv.writer(outfile, delimiter="|", quoting=csv.QUOTE_NONNUMERIC)
        writer.writerows([header])
        for idx, row in enumerate(crsr, start=1):
            writer.writerows(crsr)

**buran** · (This post was last modified: Sep-29-2022, 08:08 AM by buran.)

This

        for idx, row in enumerate(crsr, start=1):
            writer.writerows(crsr)

should be just

writer.writerows(crsr)

With your current code it will write n*n number of rows, where the n is number of rows returned. Actually it will write it only once (i.e. n rows), because crsr will be exhausted after first run. Still better fix it.

If there are no rows returned it will write empty file, with just header.
There is option to use crsr.rowcount, BUT note there are limitations:

Quote:rowcount

The number of rows modified by the last SQL statement.

This is -1 if no SQL has been executed or if the number of rows is unknown. Note that it is not uncommon for databases to report -1 immediately after a SQL select statement for performance reasons. (The exact number may not be known before the first records are returned to the application.)

You can try fetchone record, then check if anything is returned, write to the file and then write all the remaining records in one go (i.e. the crsr not yet consumed).

not tested:

import pyodbc
import csv
import os
 
connection = pyodbc.connect(
    'DRIVER={ODBC Driver 17 for SQL Server};SERVER=DESKTOP-GQK64O6;DATABASE=Customer;Trusted_Connection=yes;')
with connection.cursor() as crsr:
    qry = "Select * from employee"
    crsr.execute(qry)
    first_record = crsr.fetchone()

    if first_record:
        folderPath = "C:\\Users\\malle\\OneDrive\\Desktop\\C\\test_data\\output"
        header = next(zip(*crsr.description))

        fname = "main.csv"
        fullpath = os.path.join(folderPath, fname)
        with open(fullpath, "w", newline="") as outfile:
            writer = csv.writer(outfile, delimiter="|", quoting=csv.QUOTE_NONNUMERIC)
            writer.writerows([header, first_record])
            writer.writerows(crsr)
    else:
        print('No records')

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Create Choices from .ods file columns	cspower	3	1,927	Dec-28-2023, 09:59 PM Last Post: deanhystad
	Recommended way to read/create PDF file?	Winfried	3	5,392	Nov-26-2023, 07:51 AM Last Post: Pedroski55
	Use PM4PY and create working file	thomaskissas33	0	2,063	Nov-14-2023, 06:53 AM Last Post: thomaskissas33
	Create csv file with 4 columns for process mining	thomaskissas33	3	1,921	Nov-06-2023, 09:36 PM Last Post: deanhystad
	create exe file for linux?	korenron	2	1,857	Mar-22-2023, 01:42 PM Last Post: korenron
	validate large json file with millions of records in batches	herobpv	3	2,323	Dec-10-2022, 10:36 PM Last Post: bowlofred
	my first file won't create itself	MehHz2526	2	1,767	Nov-27-2022, 12:58 AM Last Post: MehHz2526
	python Multithreading on single file	mg24	3	3,599	Nov-05-2022, 01:33 PM Last Post: snippsat
	Reshaping a single column in to multiple column using Python	sahar	7	3,708	Jun-20-2022, 12:35 PM Last Post: deanhystad
	pandas writer create "corrupted" file	freko75	1	4,987	Jun-14-2022, 09:57 PM Last Post: snippsat

Create multiple/single csv file for each sql records

User Panel Messages

Announcements