Python Forum
Comparing two Pandas df’s and returning only changed records
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Comparing two Pandas df’s and returning only changed records
#1
df2 contains circa 1M rows and 9 columns. df2 starts life as a copy of df1, and only has changes made to values, with no rows or columns added or deleted.

what’s the most efficient way of creating df3 which contains only rows with changed values in df2 when compared with the same row in df1

def compare_large_dataframes(df1, df2):

    if df1.shape != df2.shape:
        raise ValueError("DataFrames must have the same number of rows and columns")

    merged_df = pd.merge(df1, df2, how='outer', indicator=True).query('_merge == "right_only"').drop('_merge', axis=1)

return merged_df
df1 and df2 have the same shape. I was using the above function, but it’s now throwing an error I’m having a hard time getting to the bottom of:
df3 = compare_large_dataframes(df1, df2)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/x/ztm.py", line 4586, in compare_large_dataframes
    merged_df = pd.merge(df1, df2, how='outer', indicator=True).query('_merge == "right_only"').drop('_merge', axis=1)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/x/mypy/lib/python3.12/site-packages/pandas/core/reshape/merge.py", line 184, in merge
    return op.get_result(copy=copy)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/x/mypy/lib/python3.12/site-packages/pandas/core/reshape/merge.py", line 886, in get_result
    join_index, left_indexer, right_indexer = self._get_join_info()
                                              ^^^^^^^^^^^^^^^^^^^^^
  File "/home/x/mypy/lib/python3.12/site-packages/pandas/core/reshape/merge.py", line 1151, in _get_join_info
    (left_indexer, right_indexer) = self._get_join_indexers()
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/x/mypy/lib/python3.12/site-packages/pandas/core/reshape/merge.py", line 1125, in _get_join_indexers
    return get_join_indexers(
           ^^^^^^^^^^^^^^^^^^
  File "/home/x/mypy/lib/python3.12/site-packages/pandas/core/reshape/merge.py", line 1740, in get_join_indexers
    zipped = zip(*mapped)
             ^^^^^^^^^^^^
  File "/home/x/mypy/lib/python3.12/site-packages/pandas/core/reshape/merge.py", line 1737, in <genexpr>
    _factorize_keys(left_keys[n], right_keys[n], sort=sort)
  File "/home/x/mypy/lib/python3.12/site-packages/pandas/core/reshape/merge.py", line 2570, in _factorize_keys
    llab, rlab = _sort_labels(uniques, llab, rlab)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/x/mypy/lib/python3.12/site-packages/pandas/core/reshape/merge.py", line 2631, in _sort_labels
    _, new_labels = algos.safe_sort(uniques, labels, use_na_sentinel=True)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/x/mypy/lib/python3.12/site-packages/pandas/core/algorithms.py", line 1543, in safe_sort
    raise ValueError("values should be unique if codes is not None")
ValueError: values should be unique if codes is not None
Reply
#2
Please provide an example and what you expect for output. DataFrame.compare will give you the difference between two dataframes, but the output may not be the same as you got using merge.

https://pandas.pydata.org/docs/reference...mpare.html
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  How to retrieve records in a DataFrame (Python/Pandas) that contains leading or trail mmunozjr 3 3,486 Sep-05-2022, 11:56 AM
Last Post: Pedroski55
  The behavior of tune model has changed Led_Zeppelin 5 5,971 Oct-21-2021, 06:52 PM
Last Post: jefsummers
  how can a variable change if I haven't changed it? niminim 5 4,618 Apr-07-2021, 06:57 PM
Last Post: niminim
  What is the better way of avoiding duplicate records after aggregation in pandas ? jagasrik 0 2,128 Aug-30-2020, 05:26 PM
Last Post: jagasrik
  RuntimeError: dictionary changed size during iteration Shreeniket987 3 5,880 Jun-01-2019, 01:22 PM
Last Post: buran
  my list is being changed ivinjjunior 15 7,865 May-29-2019, 02:54 PM
Last Post: ivinjjunior
  RuntimeError: dictionary changed size during iteration anna 4 4,665 Feb-20-2019, 11:04 AM
Last Post: anna
  how to work with variables changed in definitions Prof_Jar_Jar 2 3,232 Dec-16-2018, 12:04 AM
Last Post: Prof_Jar_Jar
  RuntimeError: dictionary changed size during iteration Skaperen 1 9,938 Dec-10-2018, 10:14 PM
Last Post: nilamo
  Saving Values Changed in a database themick789 1 2,756 Nov-28-2018, 08:16 AM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020