Read CSV file into MySQL and use executemany instead of execute - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: Read CSV file into MySQL and use executemany instead of execute (/thread-18057.html) |
Read CSV file into MySQL and use executemany instead of execute - bluethundr - May-04-2019 I have a python script that reads a large (4GB!!!) CSV file into MySQL. It works as is, but is DOG slow. The CSV file has over 4 million rows. And it is taking forever to insert all the records into the database. Could I get an example of how I would use executemany in this situation? source = os.path.join('source_files', 'aws_bills', 'march-bill-original-2019.csv') try: with open(source) as csv_file: csv_reader = csv.reader(csv_file, delimiter=',') next(csv_reader) insert_sql = """ INSERT INTO billing_info (InvoiceId, PayerAccountId, LinkedAccountId, RecordType, RecordId, ProductName, RateId, SubscriptionId, PricingPlanId, UsageType, Operation, AvailabilityZone, ReservedInstance, ItemDescription, UsageStartDate, UsageEndDate, UsageQuantity, BlendedRate, BlendedCost, UnBlendedRate, UnBlendedCost, ResourceId, Engagement, Name, Owner, Parent) VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s) """ #for row in csv_reader: for row_idx, row in enumerate(csv_reader): try: cursor.execute(insert_sql,row) #cursor.executemany(insert_sql, 100) mydb.commit() print('row', row_idx, 'inserted with LinkedAccountId', row[2], 'at', datetime.now().isoformat()) except Exception as e: print("MySQL Exception:", e) print("Done importing data.")Again, that code works to insert the records into the database. But I am hoping to speed this up with executemany if I can get an example of how to do that. RE: Read CSV file into MySQL and use executemany instead of execute - MvGulik - May-04-2019 For (mysql)executemany one need to pre-collect the data that one wants to commit. In such a case one would loop to pre-collect multiple records, after which the pre-collected data would be committed(executemany) after the loop. Completely loading/pre-processing a large amount of data is generally a bad idea. Processing the source data/file in chunks generally works better with large amounts of data. (ie: read N records, pre-process while you go, commit, do next N records, ...) Note sure here (not played around with DB's for a relative long time), but I think one could try to delay the actual cursor.execute(). Like only commit after processing N records. (make sure to commit-check after normal loop ending) |