Python Forum

Full Version: how to process 12 million record
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I need to read 12 millions records from a SQL table to store in another database after pivoting it. It works, but its taking lot time, any idea to improve the performance would be a great help.

Here is the table details:

The table structure of 12 million records table goes like this
mysourcetable (f_name varchar(100), l_name varchar(100), grade varchar(10), year varchar(10), age int, dob datetime())
Ex:
john abc, 2nd, 2020, 7,00-00-0000
rj bca, 5th, 2020, 10,00-00-0000

Here is the structure to target table - where the all the 12 million records will be saved
mytargettable(column_name varchar(100), column_value varchar(100), field_type varchar(100))

Ex:
f_name, john
l_name, abc
grade, 2nd
year, 2010
age, 7,
dob,00-00-000

f_name, rj
l_name, bca
grade, 5th
year, 2010
age, 10
dob,00-00-000


Here is the sample python code

						for row in df.itertuples(index=True):
                            # Function to create an insert statement for each column in a row
                            #def create_insert_statements(row):
                            try:
                                counts += 1
                                # Iterate over each column in the row
                                for column, value in row._asdict().items():
                                    # Create the insert statement
                                    col_type = lookup_key(tbl_col_type, column)

                                    statement = f"""INSERT INTO [mytargettable] ([Column_Name], [Column_Value], [Dictionary_Id]) 
                                        VALUES ('{column}', '{value}',{col_type}) \r\n """
                                    
                                    cursor.execute(statement)
                                    cursor.commit()
                        
                            except Exception as e:
                                print(f"Error inserting data ': {e}")
Looking forward to get some help to improve to performance.
(Jul-30-2024, 09:56 PM)pymach Wrote: [ -> ]I need to read 12 millions records from a SQL table to store in another database after pivoting it. It works, but its taking lot time, any idea to improve the performance would be a great help.

Looking forward to get some help to improve to performance.

If your data matches the table you are trying to update, look into the executemany method: ExecuteMany
why not just dump the entire table? select * from tablename
You can also use pandas to do this
Read a table
Select subset of dataframe
Create new table
If it should be a copy, then use the right tools. For MySQL/MariaDB mysqldump is used to export your data as plain text. Later the mysql command can be used to put the dumped data into the other database. Also the tables are created from the dump.
Or get handy sized chunks by using LIMIT:

From MySQL docs:

Quote:[LIMIT {[offset,] row_count | row_count OFFSET offset}]

From stackoverflow:

Quote:SELECT * FROM table limit 100` -- get 1st 100 records
SELECT * FROM table limit 100, 200` -- get 200 records beginning with row 101

Put that in a generator somehow, you will never overtax your computer memory.