May-06-2019, 06:43 PM
I am trying to speed up loading a large CSV file into a MySQL database. I'm using this code it takes about 4 hours to load a 4GB file with millions of rows:
I want to use the executemany() statement to make this faster. For that, you have to pass a list of tuples to the second argument.
If I build the list on each row iteration it gets too large, and I get out of memory errors when the list gets too large, and the script crashes.
I am not able to get a length of csv_reader or csv_file to use in a range statement.
How can I loop through the CSV file 1000 rows at a time and store the result in a list, use it in executemany, then store the next 1000 rows, etc until the end of the CSV file?
with open(source) as csv_file: csv_reader = csv.reader(csv_file, delimiter=',') next(csv_reader) insert_sql = """ INSERT INTO billing_info_test (InvoiceId, PayerAccountId, LinkedAccountId) VALUES (%s, %s, %s) """ for row in csv_reader: cursor.execute(insert_sql,row) print(cursor.rowcount, 'inserted with LinkedAccountId', row[2], 'at', datetime.now().isoformat()) print("Committing the DB") mydb.commit( cursor.close() mydb.close()That code does work! However, it takes many hours to complete. Between 4 and 5 hours is what it takes to ingest the file. This code is abbreviated, and does not show the try statements I have in the original.
I want to use the executemany() statement to make this faster. For that, you have to pass a list of tuples to the second argument.
If I build the list on each row iteration it gets too large, and I get out of memory errors when the list gets too large, and the script crashes.
I am not able to get a length of csv_reader or csv_file to use in a range statement.
How can I loop through the CSV file 1000 rows at a time and store the result in a list, use it in executemany, then store the next 1000 rows, etc until the end of the CSV file?