Python Forum
Frequency and timing of psycopg2 commits
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Frequency and timing of psycopg2 commits
#1
I'm building a tool to do the initial import of data for another app. The data will be read or inferred from the contents of a json file.

The total number of rows inserted will be in the 10s of thousands total in our use-case, but I would like for it to be capable of handling at least 10 times that for larger deployments.

I'm looking for best-practices or general advice on how often to do a commit.

This is roughly what the code will look like:

with open(path.join(json_dir, 'base_data.json'), mode='r', encoding='utf-8-sig') as base_data_file:
    base_data = json.load(base_data_file)

    with psycopg2.connect(**db_connection_params) as db_conn:
        db_cursor = db_conn.cursor()
        
        # Import tags (TODO: See if this can be passed in as a single list of all tags)
        for tag in base_data['tags']:
            db_cursor.execute("""
                INSERT INTO api_tag (name, description) VALUES (%s, %s);
                """,
                (tag['name'], tag['description'])
            )
        db_conn.commit() # <- commit or wait?

        # Import brands
        # for brand in base_data['brands']...

        # Import DHCPServers

        # Import Subnets/Addresses


I don't expect a typical deployment to generate a great number of rows in the tags, brands, or DHCPServers sections, but Subnets/Addresses will generate a row for every subnet, and a row for every ip address that exists on-site. For my use, which is likely the only place it will ever be used, it will generate ~10,000 rows.

My thoughts are to commit after each block (tags, brands, DHCPServers) and then after each subnet in base_data['subnets'], which should each generate no more than a couple thousand rows for most realistic environments (~250 for a typical 24-bit subnet)

Ideally, I would like to be able to wait to commit until all queries have been successful. That would be the safest way to insure that I don't end up with a half-baked database after a query fails. However, I don't have a good enough low-level understanding of postgresql or the library to know where all this data sits before a commit, how much can sit there, and how much can be committed at one time.

I would appreciate any input from anyone who has a better grip on postgresql, or psycopg2.

Just after posting this I found the best practices section of the psycopg2 FAQ that reads...
Quote:When should I save and re-use a connection as opposed to creating a new one as needed?
Creating a connection can be slow (think of SSL over TCP) so the best practice is to create a single connection and keep it open as long as required. It is also good practice to rollback or commit frequently (even after a single SELECT statement) to make sure the backend is never left “idle in transaction”. See also psycopg2.pool for lightweight connection pooling.

Still would appreciate input, but I think my solution is frequent commits and just do more checks to allow it to continue seamlessly on successive runs against a half-built database.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
Question Timing actions with Python dangermaus33 0 980 Apr-19-2022, 10:08 PM
Last Post: dangermaus33
  Inconsistent counting / timing with threading rantwhy 1 1,719 Nov-24-2021, 04:04 AM
Last Post: deanhystad
  How psycopg2 autocommit works ? johntay 3 10,206 Oct-08-2021, 11:22 AM
Last Post: Larz60+
  Psycopg2 doesn't work with python2 MedianykEugene 3 2,888 Aug-10-2021, 07:00 AM
Last Post: ndc85430
  Synchronization/Timing Problem quest 5 2,930 Mar-31-2021, 10:26 PM
Last Post: quest
  Timing of a while loop stylingpat 4 6,673 Mar-31-2021, 10:48 AM
Last Post: stylingpat
  PostgreSQL psycopg2.errors.DuplicateColumn: column specified more than once rajnish_nationfirst 2 3,693 Jun-21-2020, 08:17 AM
Last Post: ibreeden
  Assigning Data from one column to another with different associated timing interval alexafshari 1 1,924 Apr-30-2020, 03:59 PM
Last Post: pyzyx3qwerty
  tornado psycopg2 Nikosznb 1 2,282 Feb-23-2020, 10:49 PM
Last Post: scidam
  Perpetual timing Mark17 3 2,838 Oct-24-2019, 03:46 PM
Last Post: Gribouillis

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020