Python Forum
Read CSV file into MySQL and use executemany instead of execute
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Read CSV file into MySQL and use executemany instead of execute
#1
I have a python script that reads a large (4GB!!!) CSV file into MySQL. It works as is, but is DOG slow. The CSV file has over 4 million rows. And it is taking forever to insert all the records into the database.

Could I get an example of how I would use executemany in this situation?

source = os.path.join('source_files', 'aws_bills', 'march-bill-original-2019.csv')
try:
	with open(source) as csv_file:
		csv_reader = csv.reader(csv_file, delimiter=',')
		next(csv_reader)
		insert_sql = """ INSERT INTO billing_info (InvoiceId, PayerAccountId, LinkedAccountId, RecordType, RecordId, ProductName, RateId, SubscriptionId, PricingPlanId, UsageType, Operation, AvailabilityZone, ReservedInstance, ItemDescription, UsageStartDate, UsageEndDate, UsageQuantity, BlendedRate, BlendedCost, UnBlendedRate, UnBlendedCost, ResourceId, Engagement, Name, Owner, Parent) VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s) """
		#for row in csv_reader:
		for row_idx, row in enumerate(csv_reader):
			try:
				cursor.execute(insert_sql,row)
				#cursor.executemany(insert_sql, 100)
				mydb.commit()
				print('row', row_idx, 'inserted with LinkedAccountId', row[2], 'at', datetime.now().isoformat())
			except Exception as e:
				print("MySQL Exception:", e)
		print("Done importing data.") 
Again, that code works to insert the records into the database. But I am hoping to speed this up with executemany if I can get an example of how to do that.
Reply
#2
For (mysql)executemany one need to pre-collect the data that one wants to commit.
In such a case one would loop to pre-collect multiple records, after which the pre-collected data would be committed(executemany) after the loop.

Completely loading/pre-processing a large amount of data is generally a bad idea. Processing the source data/file in chunks generally works better with large amounts of data. (ie: read N records, pre-process while you go, commit, do next N records, ...)

Note sure here (not played around with DB's for a relative long time), but I think one could try to delay the actual cursor.execute(). Like only commit after processing N records. (make sure to commit-check after normal loop ending)
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Recommended way to read/create PDF file? Winfried 3 2,781 Nov-26-2023, 07:51 AM
Last Post: Pedroski55
  python Read each xlsx file and write it into csv with pipe delimiter mg24 4 1,303 Nov-09-2023, 10:56 AM
Last Post: mg24
  Mysql and mysql.connector error lostintime 2 607 Oct-03-2023, 10:25 PM
Last Post: lostintime
  read file txt on my pc to telegram bot api Tupa 0 1,047 Jul-06-2023, 01:52 AM
Last Post: Tupa
  parse/read from file seperated by dots giovanne 5 1,038 Jun-26-2023, 12:26 PM
Last Post: DeaD_EyE
  Formatting a date time string read from a csv file DosAtPython 5 1,160 Jun-19-2023, 02:12 PM
Last Post: DosAtPython
  How do I read and write a binary file in Python? blackears 6 5,998 Jun-06-2023, 06:37 PM
Last Post: rajeshgk
  Read csv file with inconsistent delimiter gracenz 2 1,140 Mar-27-2023, 08:59 PM
Last Post: deanhystad
  Python Serial: How to read the complete line to insert to MySQL? sylar 1 781 Mar-21-2023, 10:06 PM
Last Post: deanhystad
  Read text file, modify it then write back Pavel_47 5 1,498 Feb-18-2023, 02:49 PM
Last Post: deanhystad

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020