Hi Team,
I am using below piece of code for writing SQL Data into csv file. size of data is big.
60gb
I want to create a generic function dynamically for writing data into function
for row in rows:
csvwriter.writerow(row)
import csv
cursor = connection.cursor()
cursor.execute('SELECT * FROM employee')
filename = "abc.csv"
Data = []
# writing to csv file
with open(filename, 'w',newline = '') as csvfile:
csvwriter = csv.writer(csvfile)
csvwriter.writerow(col[0] for col in cursor.description)
while True:
rows = cursor.fetchmany(10000)
print(rows)
if len(rows) ==0:
break
else:
for row in rows:
csvwriter.writerow(row)
cursor.close()
connection.close()
something like:
def query_to_csv(query, csvfilename, conn):
df = pd.read_sql(query, conn)
df.to_csv(csvfilename)
Recently there was another question about a 60GB csv. Are you guys in the same class? What is a 60GB csv for?
How many rows would such a file have? Must be millions and millions! How many columns?
Not many people have 64GB ram. Get little bits at a time
Assuming your database table has a column id, a unique key, you can use that to get chunks of the table and write to a text file. If you get interrupted, or want to stop when the number of lines reaches a very large number, you can. Next time open it again append, or create another text file
This just gets 1 row at a time, but you could change that for fetchmany(size=1000) or whatever. You can directly open the result text file in Libre Office Calc as a spreadsheet.
import pymysql
savepath = '/home/pedro/myPython/pymysql/results.txt'
def mysqlRemoteHW(clas, weeknr, idnum):
# To connect remote MySQL database
conn = pymysql.connect(
host='123.456.789.123',
user='me',
password = 'secret',
db='allstudentsdb',
)
cur = conn.cursor()
# Select query
#cur.execute(f"SELECT studentnr, score FROM allstudentsAnswers{clas} WHERE weeknr = '{weeknr}'")
#cursor.execute("SELECT spam FROM eggs WHERE lumberjack = ?", (lumberjack,))
sql = f"SELECT id, studentnr, score FROM allstudentsAnswers{clas}HW WHERE weeknr = %s AND id = %s"
cur.execute(sql, (weeknr, idnum,))
output = cur.fetchone()
#for i in output:
#print(i)
# To close the connection
conn.close()
return output
clas = input('Enter 21BE ')
print('What week number scores are we getting from MySQL?')
weeknumber = input('Just enter a week number, like: Week1 or Week4 or Week11 ')
with open(savepath, 'a') as res:
for i in range(510, 521):
results = mysqlRemoteHW(clas, weeknumber, i)
string = str(results[0]) + ',' + str(results[1]) + ',' + str(results[2]) + '\n'
res.write(string)
Just a little calculation:
1GB = 1 073 741 824 bytes
Using the above code, (id,studentnumber,score + a new line character) = 18 bytes written to 1 line of the text file.
That means, if I had 1 GB of results, I would need 59 652 323 lines, approx 60 million lines.
If I had 60 GB of data, I would need about 3600 million lines to write each data set on a new line, csv style!!
That is a lot of lines!
Thank you so much for your help ,
Actually my question was not clear,
here 60gb data means SQL Tables data , I want that in csv.
Intention of taking it into csv. later compress it gzip and move to cloud.
in office I have 124gb ram,
using bcp ---- I extracted 60gb data with 32 mins.
via pyodbc ----> it has taken 7 hours to take sql data into csv.
Thanks
mg