Python Forum
Validating Dataframe Using Second Dataframe
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Validating Dataframe Using Second Dataframe
#1
Hello all,

I am new to Python but not new to programming. I have a datafile of 28k rows x 18cols (skudata) that I am loading into a dataframe (cat_master) in order to do various data quality checks. One of the checks is to compare the Category and Subcategory columns from skudata dataframe to ensure that the combination of Category and Subcategory is a valid entry stored in the cat_master dataframe. the cat_master dataframe only has these two columns (also named Category and Subcategory.

The result I want is the rows in skudata whose category and subcategory to NOT match the master list in the cat_master dataframe. Keep in mind it is the combination of the Category and Subcategory in skudata that need to match the combination of Category and Subcategory in cat_master in order to be considered a valid row.

Here's what I have in terms of setup but need help in doing the actual "selection" of invalid rows in skudata.
import pandas as pd
skudata = pd.read_csv("S&OP SKU Data.csv")
cat_master = pd.read_csv("Valid Categories & Subcategories")
What do I need to do now in order to select and display only the rows in skudata where the category & subcategory combo does not exist in cat_master?

thank you!
Yoriz write Dec-05-2022, 06:15 PM:
Please post all code, output and errors (in their entirety) between their respective tags. Refer to BBCode help topic on how to post. Use the "Preview Post" button to make sure the code is presented as you expect before hitting the "Post Reply/Thread" button.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Create dataframe from the unique data of two dataframes Calab 6 724 Mar-02-2025, 01:51 PM
Last Post: Pedroski55
Question [Solved] Formatting cells of a pandas dataframe into an OpenDocument ods spreadsheet Calab 1 483 Mar-01-2025, 04:51 AM
Last Post: Calab
  Help Refining DataFrame GMAlves 2 1,150 Nov-05-2024, 08:51 PM
Last Post: deanhystad
  Find duplicates in a pandas dataframe list column on other rows Calab 2 1,905 Sep-18-2024, 07:38 PM
Last Post: Calab
  Find strings by index from a list of indexes in a different Pandas dataframe column Calab 3 1,535 Aug-26-2024, 04:52 PM
Last Post: Calab
  Loop over dataframe to fill in missing rows Scott 9 3,449 Jul-12-2024, 05:54 AM
Last Post: Scott
  Create new column in dataframe Scott 10 3,341 Jun-30-2024, 10:18 PM
Last Post: Scott
  attempt to split values from within a dataframe column mbrown009 9 5,703 Jun-20-2024, 07:59 PM
Last Post: AdamHensley
  Putting column name to dataframe, can't work. jonah88888 2 3,204 Jun-18-2024, 09:19 PM
Last Post: AdamHensley
  Add NER output to pandas dataframe dg3000 0 1,111 Apr-22-2024, 08:14 PM
Last Post: dg3000

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020