Python Forum
Create multiple/single csv file for each sql records
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Create multiple/single csv file for each sql records
#1
Hi Team,

I am creating multiple csv files, one csv for one records.
with header. code is working as expected.

I want to avoid fetch all , as office SQL Data is big 60gb . memory issue might come.

how to optimize my below code. multiple times I am using this.
writer = csv.writer(outfile,delimiter="|",quoting=csv.QUOTE_NONNUMERIC)
writer.writerow(col[0] for col in cursor.description)


import pyodbc
import csv
import os

connection = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER=DESKTOP-GQK64O6;DATABASE=Customer;Trusted_Connection=yes;')
cursor = connection.cursor()

qry = "Select * from employee"
cursor.execute(qry)

data = cursor.fetchall()
print(len(data))



folderPath = "C:\\Users\\xyz\\OneDrive\\Desktop\\C\\test_data\\output"
count = 0
# for reply in cursor.execute(qry):
for x in data:
    count = count + 1
    print(count)
    fname = "Row"+str(count)+".csv"
    fullpath = os.path.join(folderPath,fname)
    print(fullpath)
    with open(fullpath,"w",newline="") as outfile:
        writer = csv.writer(outfile,delimiter="|",quoting=csv.QUOTE_NONNUMERIC)
        writer.writerow(col[0] for col in cursor.description)
        writer.writerow(x)
        print(f"I am row {count}",x)
cursor.close()
connection.close()
Reply
#2
I
(Sep-28-2022, 07:49 PM)mg24 Wrote: I want to avoid fetch all , as office SQL Data is big 60gb . memory issue might come.
There is no need to do cursor.fetchall(). You can iterate over cursor directly.
There is even an example in the docs
Quote:
# standard
cursor.execute("select a from tbl where b=? and c=?", (x, y))
# pyodbc extension
cursor.execute("select a from tbl where b=? and c=?", x, y)
The return value is always the cursor itself:

for row in cursor.execute("select user_id, user_name from users"):
    print(row.user_id, row.user_name)


Then (not tested code):
import pyodbc
import csv
import os
 
connection = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER=DESKTOP-GQK64O6;DATABASE=Customer;Trusted_Connection=yes;')
with connection.cursor() as crsr:
 
    qry = "Select * from employee"
    crsr.execute(qry)
    
    folderPath = "C:\\Users\\xyz\\OneDrive\\Desktop\\C\\test_data\\output"
    header = next(zip(*crsr.description))

    for idx, row in enumerate(crsr, start=1):
        fname = f"Row{idx}.csv"
        fullpath = os.path.join(folderPath,fname)
        print(fullpath)
        with open(fullpath, "w", newline="") as outfile:
            writer = csv.writer(outfile, delimiter="|", quoting=csv.QUOTE_NONNUMERIC)
            writer.writerows([header, row])
            print(f"I am row {idx} --> {row}")
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#3
Hi Buran,

superb ! Thank you so much , I liked your code. Whistle

Similar situation here, Extracting all records into single csv files.
using fetchmany extracted 60gb of data.
my code is working , can we make any improvement in below code plz

import pyodbc
import csv

connection = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER=DESKTOP-GQK64O6;DATABASE=Customer;Trusted_Connection=yes;')
cursor = connection.cursor()


qry = "select * from employee"

cursor.execute(qry) 
data = cursor.fetchall()


With open("E:\\backup\\output.csv","w","newline="") as outfile
	writer = csv.writer(outfile,quoting = csv.QUOTE_NONNUMERIC)
	writer.writerows(col[0] for col in cursor.description)
	While True:
		rows = cursor.fetchmany(10000)
		if len(rows) ==0:
			print("no records found")
			break
		else:
	          x = x+len(rows)
               print(x)
               for row in rows:
                  writer.writerows(row)


conn.close()
print("success")
print('time taken' . time.time()-initial,"Seconds")
Reply
#4
Hi Team,

below code I am using to extract big sql table 60gb.
how to extract sql table to csv using sqlalchemy with headers.

import pyodbc
import csv

connection = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER=DESKTOP-GQK64O6;DATABASE=Customer;Trusted_Connection=yes;')
cursor = connection.cursor()


qry = "select * from employee"

cursor.execute(qry) 
#data = cursor.fetchall()


With open("E:\\backup\\output.csv","w","newline="") as outfile
	writer = csv.writer(outfile,quoting = csv.QUOTE_NONNUMERIC)
	writer.writerows(col[0] for col in cursor.description)
	While True:
		rows = cursor.fetchmany(10000)
		if len(rows) ==0:
			print("no records found")
			break
		else:
	          x = x+len(rows)
               print(x)
               for row in rows:
                  writer.writerows(row)


conn.close()
print("success")
print('time taken' . time.time()-initial,"Seconds")
Below is my attempted code, how to fetch extract SQL Table data with header using pycharm.
without memory full issue.

from sqlalchemy import create_engine
import pandas as pd

DRIVER= "ODBC Driver 17 for SQL Server"
server="DESKTOP-GQK64O6"
DATABASE="Customer"

str_conn = f'mssql://{server}/{DATABASE}?Driver={DRIVER}'
connection = create_engine(str_conn).connect()

mylist = []

#Read in Chunk
for chunk in pd.read_sql_table('employee',connection,chunksize=1000):
	mylist.append(chunk)
df = pd.concat(mylist)
ndf = pd.DataFrame(df)

ndf.to_parquet("C:\\C_Programming\\output.csv")
Thanks
mg
Reply
#5
I appreciate you like my code, but at the same time it's disappointing that nothing I show in my code found a place in your updated code that extract the whole table in single SQL file - the context manager, not using fetch, etc...- and you have basically the same question. You even introduce pure SyntaxErrors - e.g. capital first char in While, With.... And ask us to make "improvements"
All you had to do is to write to a single file. Open the file before iterating over cursor. In fact, after you write the header, you can use file.writerows(cursor) to write all the data in the file in a single line.

I/we can update the code for you, but this site is focused on education, so better try to implement the changes I show you
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#6
Hi Buran,

I did what you suggested, code is working fine. thanks again.
I have one question ,

1) if there is no record in sql. does it go in infinite loop, we are not using break condition / checking are there any records.
2) is there any chance of memory full issue if (SQL Table is big one)

below is attempted as code. is this correct now.

import pyodbc
import csv
import os

connection = pyodbc.connect(
    'DRIVER={ODBC Driver 17 for SQL Server};SERVER=DESKTOP-GQK64O6;DATABASE=Customer;Trusted_Connection=yes;')
with connection.cursor() as crsr:
    qry = "Select * from employee"
    crsr.execute(qry)

    folderPath = "C:\\Users\\malle\\OneDrive\\Desktop\\C\\test_data\\output"
    header = next(zip(*crsr.description))

    fname = "main.csv"
    fullpath = os.path.join(folderPath, fname)
    with open(fullpath, "w", newline="") as outfile:
        writer = csv.writer(outfile, delimiter="|", quoting=csv.QUOTE_NONNUMERIC)
        writer.writerows([header])
        for idx, row in enumerate(crsr, start=1):
            writer.writerows(crsr)
Reply
#7
This
        for idx, row in enumerate(crsr, start=1):
            writer.writerows(crsr)
should be just

writer.writerows(crsr)
With your current code it will write n*n number of rows, where the n is number of rows returned. Actually it will write it only once (i.e. n rows), because crsr will be exhausted after first run. Still better fix it.

If there are no rows returned it will write empty file, with just header.
There is option to use crsr.rowcount, BUT note there are limitations:

Quote:rowcount

The number of rows modified by the last SQL statement.

This is -1 if no SQL has been executed or if the number of rows is unknown. Note that it is not uncommon for databases to report -1 immediately after a SQL select statement for performance reasons. (The exact number may not be known before the first records are returned to the application.)

You can try fetchone record, then check if anything is returned, write to the file and then write all the remaining records in one go (i.e. the crsr not yet consumed).
not tested:

import pyodbc
import csv
import os
 
connection = pyodbc.connect(
    'DRIVER={ODBC Driver 17 for SQL Server};SERVER=DESKTOP-GQK64O6;DATABASE=Customer;Trusted_Connection=yes;')
with connection.cursor() as crsr:
    qry = "Select * from employee"
    crsr.execute(qry)
    first_record = crsr.fetchone()

    if first_record:
        folderPath = "C:\\Users\\malle\\OneDrive\\Desktop\\C\\test_data\\output"
        header = next(zip(*crsr.description))

        fname = "main.csv"
        fullpath = os.path.join(folderPath, fname)
        with open(fullpath, "w", newline="") as outfile:
            writer = csv.writer(outfile, delimiter="|", quoting=csv.QUOTE_NONNUMERIC)
            writer.writerows([header, first_record])
            writer.writerows(crsr)
    else:
        print('No records')
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Create Choices from .ods file columns cspower 3 520 Dec-28-2023, 09:59 PM
Last Post: deanhystad
  Recommended way to read/create PDF file? Winfried 3 2,786 Nov-26-2023, 07:51 AM
Last Post: Pedroski55
  Use PM4PY and create working file thomaskissas33 0 573 Nov-14-2023, 06:53 AM
Last Post: thomaskissas33
  Create csv file with 4 columns for process mining thomaskissas33 3 695 Nov-06-2023, 09:36 PM
Last Post: deanhystad
  create exe file for linux? korenron 2 911 Mar-22-2023, 01:42 PM
Last Post: korenron
  validate large json file with millions of records in batches herobpv 3 1,223 Dec-10-2022, 10:36 PM
Last Post: bowlofred
  my first file won't create itself MehHz2526 2 860 Nov-27-2022, 12:58 AM
Last Post: MehHz2526
  python Multithreading on single file mg24 3 1,672 Nov-05-2022, 01:33 PM
Last Post: snippsat
  Reshaping a single column in to multiple column using Python sahar 7 1,969 Jun-20-2022, 12:35 PM
Last Post: deanhystad
Sad pandas writer create "corrupted" file freko75 1 2,737 Jun-14-2022, 09:57 PM
Last Post: snippsat

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020