Python Forum

Hi Team,

I am using below piece of code for writing SQL Data into csv file. size of data is big. 60gb
I want to create a generic function dynamically for writing data into function

for row in rows:
csvwriter.writerow(row)

import csv
cursor = connection.cursor()
cursor.execute('SELECT * FROM employee')

filename = "abc.csv"

Data = []

# writing to csv file
with open(filename, 'w',newline = '') as csvfile:
    csvwriter = csv.writer(csvfile)
    csvwriter.writerow(col[0] for col in cursor.description)
    while True:
        rows = cursor.fetchmany(10000)
        print(rows)
        if len(rows) ==0:
            break
        else:
              for row in rows:
                 csvwriter.writerow(row)
    cursor.close()
    connection.close()

something like:

def query_to_csv(query, csvfilename, conn):
    df = pd.read_sql(query, conn)
    df.to_csv(csvfilename)

Recently there was another question about a 60GB csv. Are you guys in the same class? What is a 60GB csv for?

How many rows would such a file have? Must be millions and millions! How many columns?

Not many people have 64GB ram. Get little bits at a time

Assuming your database table has a column id, a unique key, you can use that to get chunks of the table and write to a text file. If you get interrupted, or want to stop when the number of lines reaches a very large number, you can. Next time open it again append, or create another text file

This just gets 1 row at a time, but you could change that for fetchmany(size=1000) or whatever. You can directly open the result text file in Libre Office Calc as a spreadsheet.

import pymysql

savepath = '/home/pedro/myPython/pymysql/results.txt'

def mysqlRemoteHW(clas, weeknr, idnum): 
    # To connect remote MySQL database 
    conn = pymysql.connect( 
        host='123.456.789.123', 
        user='me',  
        password = 'secret', 
        db='allstudentsdb', 
        ) 
      
    cur = conn.cursor()

    # Select query 
    #cur.execute(f"SELECT studentnr, score FROM allstudentsAnswers{clas} WHERE weeknr = '{weeknr}'")
    #cursor.execute("SELECT spam FROM eggs WHERE lumberjack = ?", (lumberjack,))
    sql = f"SELECT id, studentnr, score FROM allstudentsAnswers{clas}HW WHERE weeknr = %s AND id = %s"
    cur.execute(sql, (weeknr, idnum,))
    output = cur.fetchone() 
      
    #for i in output: 
        #print(i) 
      
    # To close the connection 
    conn.close()
    return output

clas = input('Enter 21BE  ')
print('What week number scores are we getting from MySQL?')
weeknumber = input('Just enter a week number, like: Week1 or Week4 or Week11 ')

with open(savepath, 'a') as res:
    for i in range(510, 521):
        results = mysqlRemoteHW(clas, weeknumber, i)
        string = str(results[0]) + ',' + str(results[1]) + ',' + str(results[2]) + '\n'
        res.write(string)

Just a little calculation:

1GB = 1 073 741 824 bytes

Using the above code, (id,studentnumber,score + a new line character) = 18 bytes written to 1 line of the text file.

That means, if I had 1 GB of results, I would need 59 652 323 lines, approx 60 million lines.

If I had 60 GB of data, I would need about 3600 million lines to write each data set on a new line, csv style!!

That is a lot of lines!

Thank you so much for your help ,

Actually my question was not clear,
here 60gb data means SQL Tables data , I want that in csv.

Intention of taking it into csv. later compress it gzip and move to cloud.

in office I have 124gb ram,

using bcp ---- I extracted 60gb data with 32 mins.
via pyodbc ----> it has taken 7 hours to take sql data into csv.

Thanks
mg

mg24

Larz60+

Pedroski55

Pedroski55

mg24