![]() |
Frequency and timing of psycopg2 commits - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: Frequency and timing of psycopg2 commits (/thread-22162.html) |
Frequency and timing of psycopg2 commits - acecase - Nov-01-2019 I'm building a tool to do the initial import of data for another app. The data will be read or inferred from the contents of a json file. The total number of rows inserted will be in the 10s of thousands total in our use-case, but I would like for it to be capable of handling at least 10 times that for larger deployments. I'm looking for best-practices or general advice on how often to do a commit. This is roughly what the code will look like: with open(path.join(json_dir, 'base_data.json'), mode='r', encoding='utf-8-sig') as base_data_file: base_data = json.load(base_data_file) with psycopg2.connect(**db_connection_params) as db_conn: db_cursor = db_conn.cursor() # Import tags (TODO: See if this can be passed in as a single list of all tags) for tag in base_data['tags']: db_cursor.execute(""" INSERT INTO api_tag (name, description) VALUES (%s, %s); """, (tag['name'], tag['description']) ) db_conn.commit() # <- commit or wait? # Import brands # for brand in base_data['brands']... # Import DHCPServers # Import Subnets/Addresses I don't expect a typical deployment to generate a great number of rows in the tags, brands, or DHCPServers sections, but Subnets/Addresses will generate a row for every subnet, and a row for every ip address that exists on-site. For my use, which is likely the only place it will ever be used, it will generate ~10,000 rows. My thoughts are to commit after each block (tags, brands, DHCPServers) and then after each subnet in base_data['subnets'], which should each generate no more than a couple thousand rows for most realistic environments (~250 for a typical 24-bit subnet) Ideally, I would like to be able to wait to commit until all queries have been successful. That would be the safest way to insure that I don't end up with a half-baked database after a query fails. However, I don't have a good enough low-level understanding of postgresql or the library to know where all this data sits before a commit, how much can sit there, and how much can be committed at one time. I would appreciate any input from anyone who has a better grip on postgresql, or psycopg2. Just after posting this I found the best practices section of the psycopg2 FAQ that reads... Quote:When should I save and re-use a connection as opposed to creating a new one as needed? Still would appreciate input, but I think my solution is frequent commits and just do more checks to allow it to continue seamlessly on successive runs against a half-built database. |