Posts: 119
Threads: 66
Joined: Sep 2022
Sep-28-2022, 07:49 PM
(This post was last modified: Sep-29-2022, 06:48 AM by buran.)
Hi Team,
I am creating multiple csv files, one csv for one records.
with header. code is working as expected.
I want to avoid fetch all , as office SQL Data is big 60gb . memory issue might come.
how to optimize my below code. multiple times I am using this.
writer = csv.writer(outfile,delimiter="|",quoting=csv.QUOTE_NONNUMERIC)
writer.writerow(col[0] for col in cursor.description)
import pyodbc
import csv
import os
connection = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER=DESKTOP-GQK64O6;DATABASE=Customer;Trusted_Connection=yes;')
cursor = connection.cursor()
qry = "Select * from employee"
cursor.execute(qry)
data = cursor.fetchall()
print(len(data))
folderPath = "C:\\Users\\xyz\\OneDrive\\Desktop\\C\\test_data\\output"
count = 0
# for reply in cursor.execute(qry):
for x in data:
count = count + 1
print(count)
fname = "Row"+str(count)+".csv"
fullpath = os.path.join(folderPath,fname)
print(fullpath)
with open(fullpath,"w",newline="") as outfile:
writer = csv.writer(outfile,delimiter="|",quoting=csv.QUOTE_NONNUMERIC)
writer.writerow(col[0] for col in cursor.description)
writer.writerow(x)
print(f"I am row {count}",x)
cursor.close()
connection.close()
Posts: 8,151
Threads: 160
Joined: Sep 2016
Sep-28-2022, 08:22 PM
(This post was last modified: Sep-28-2022, 08:22 PM by buran.)
I (Sep-28-2022, 07:49 PM)mg24 Wrote: I want to avoid fetch all , as office SQL Data is big 60gb . memory issue might come. There is no need to do cursor.fetchall() . You can iterate over cursor directly.
There is even an example in the docs
Quote:# standard
cursor.execute("select a from tbl where b=? and c=?", (x, y)) # pyodbc extension
cursor.execute("select a from tbl where b=? and c=?", x, y) The return value is always the cursor itself:
for row in cursor.execute("select user_id, user_name from users"):
print(row.user_id, row.user_name)
Then (not tested code):
import pyodbc
import csv
import os
connection = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER=DESKTOP-GQK64O6;DATABASE=Customer;Trusted_Connection=yes;')
with connection.cursor() as crsr:
qry = "Select * from employee"
crsr.execute(qry)
folderPath = "C:\\Users\\xyz\\OneDrive\\Desktop\\C\\test_data\\output"
header = next(zip(*crsr.description))
for idx, row in enumerate(crsr, start=1):
fname = f"Row{idx}.csv"
fullpath = os.path.join(folderPath,fname)
print(fullpath)
with open(fullpath, "w", newline="") as outfile:
writer = csv.writer(outfile, delimiter="|", quoting=csv.QUOTE_NONNUMERIC)
writer.writerows([header, row])
print(f"I am row {idx} --> {row}")
Posts: 119
Threads: 66
Joined: Sep 2022
Hi Buran,
superb ! Thank you so much , I liked your code.
Similar situation here, Extracting all records into single csv files.
using fetchmany extracted 60gb of data.
my code is working , can we make any improvement in below code plz
import pyodbc
import csv
connection = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER=DESKTOP-GQK64O6;DATABASE=Customer;Trusted_Connection=yes;')
cursor = connection.cursor()
qry = "select * from employee"
cursor.execute(qry)
data = cursor.fetchall()
With open("E:\\backup\\output.csv","w","newline="") as outfile
writer = csv.writer(outfile,quoting = csv.QUOTE_NONNUMERIC)
writer.writerows(col[0] for col in cursor.description)
While True:
rows = cursor.fetchmany(10000)
if len(rows) ==0:
print("no records found")
break
else:
x = x+len(rows)
print(x)
for row in rows:
writer.writerows(row)
conn.close()
print("success")
print('time taken' . time.time()-initial,"Seconds")
Posts: 119
Threads: 66
Joined: Sep 2022
Hi Team,
below code I am using to extract big sql table 60gb.
how to extract sql table to csv using sqlalchemy with headers.
import pyodbc
import csv
connection = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER=DESKTOP-GQK64O6;DATABASE=Customer;Trusted_Connection=yes;')
cursor = connection.cursor()
qry = "select * from employee"
cursor.execute(qry)
#data = cursor.fetchall()
With open("E:\\backup\\output.csv","w","newline="") as outfile
writer = csv.writer(outfile,quoting = csv.QUOTE_NONNUMERIC)
writer.writerows(col[0] for col in cursor.description)
While True:
rows = cursor.fetchmany(10000)
if len(rows) ==0:
print("no records found")
break
else:
x = x+len(rows)
print(x)
for row in rows:
writer.writerows(row)
conn.close()
print("success")
print('time taken' . time.time()-initial,"Seconds") Below is my attempted code, how to fetch extract SQL Table data with header using pycharm.
without memory full issue.
from sqlalchemy import create_engine
import pandas as pd
DRIVER= "ODBC Driver 17 for SQL Server"
server="DESKTOP-GQK64O6"
DATABASE="Customer"
str_conn = f'mssql://{server}/{DATABASE}?Driver={DRIVER}'
connection = create_engine(str_conn).connect()
mylist = []
#Read in Chunk
for chunk in pd.read_sql_table('employee',connection,chunksize=1000):
mylist.append(chunk)
df = pd.concat(mylist)
ndf = pd.DataFrame(df)
ndf.to_parquet("C:\\C_Programming\\output.csv") Thanks
mg
Posts: 8,151
Threads: 160
Joined: Sep 2016
Sep-29-2022, 06:56 AM
(This post was last modified: Sep-29-2022, 06:56 AM by buran.)
I appreciate you like my code, but at the same time it's disappointing that nothing I show in my code found a place in your updated code that extract the whole table in single SQL file - the context manager, not using fetch, etc...- and you have basically the same question. You even introduce pure SyntaxErrors - e.g. capital first char in While , With .... And ask us to make "improvements"
All you had to do is to write to a single file. Open the file before iterating over cursor. In fact, after you write the header, you can use file.writerows(cursor) to write all the data in the file in a single line.
I/we can update the code for you, but this site is focused on education, so better try to implement the changes I show you
Posts: 119
Threads: 66
Joined: Sep 2022
Hi Buran,
I did what you suggested, code is working fine. thanks again.
I have one question ,
1) if there is no record in sql. does it go in infinite loop, we are not using break condition / checking are there any records.
2) is there any chance of memory full issue if (SQL Table is big one)
below is attempted as code. is this correct now.
import pyodbc
import csv
import os
connection = pyodbc.connect(
'DRIVER={ODBC Driver 17 for SQL Server};SERVER=DESKTOP-GQK64O6;DATABASE=Customer;Trusted_Connection=yes;')
with connection.cursor() as crsr:
qry = "Select * from employee"
crsr.execute(qry)
folderPath = "C:\\Users\\malle\\OneDrive\\Desktop\\C\\test_data\\output"
header = next(zip(*crsr.description))
fname = "main.csv"
fullpath = os.path.join(folderPath, fname)
with open(fullpath, "w", newline="") as outfile:
writer = csv.writer(outfile, delimiter="|", quoting=csv.QUOTE_NONNUMERIC)
writer.writerows([header])
for idx, row in enumerate(crsr, start=1):
writer.writerows(crsr)
Posts: 8,151
Threads: 160
Joined: Sep 2016
Sep-29-2022, 08:06 AM
(This post was last modified: Sep-29-2022, 08:08 AM by buran.)
This
for idx, row in enumerate(crsr, start=1):
writer.writerows(crsr) should be just
writer.writerows(crsr) With your current code it will write n*n number of rows, where the n is number of rows returned. Actually it will write it only once (i.e. n rows), because crsr will be exhausted after first run. Still better fix it.
If there are no rows returned it will write empty file, with just header.
There is option to use crsr.rowcount, BUT note there are limitations:
Quote:rowcount
The number of rows modified by the last SQL statement.
This is -1 if no SQL has been executed or if the number of rows is unknown. Note that it is not uncommon for databases to report -1 immediately after a SQL select statement for performance reasons. (The exact number may not be known before the first records are returned to the application.)
You can try fetchone record, then check if anything is returned, write to the file and then write all the remaining records in one go (i.e. the crsr not yet consumed).
not tested:
import pyodbc
import csv
import os
connection = pyodbc.connect(
'DRIVER={ODBC Driver 17 for SQL Server};SERVER=DESKTOP-GQK64O6;DATABASE=Customer;Trusted_Connection=yes;')
with connection.cursor() as crsr:
qry = "Select * from employee"
crsr.execute(qry)
first_record = crsr.fetchone()
if first_record:
folderPath = "C:\\Users\\malle\\OneDrive\\Desktop\\C\\test_data\\output"
header = next(zip(*crsr.description))
fname = "main.csv"
fullpath = os.path.join(folderPath, fname)
with open(fullpath, "w", newline="") as outfile:
writer = csv.writer(outfile, delimiter="|", quoting=csv.QUOTE_NONNUMERIC)
writer.writerows([header, first_record])
writer.writerows(crsr)
else:
print('No records')
|