Python Forum
Validating Dataframe Using Second Dataframe
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Validating Dataframe Using Second Dataframe
#1
Hello all,

I am new to Python but not new to programming. I have a datafile of 28k rows x 18cols (skudata) that I am loading into a dataframe (cat_master) in order to do various data quality checks. One of the checks is to compare the Category and Subcategory columns from skudata dataframe to ensure that the combination of Category and Subcategory is a valid entry stored in the cat_master dataframe. the cat_master dataframe only has these two columns (also named Category and Subcategory.

The result I want is the rows in skudata whose category and subcategory to NOT match the master list in the cat_master dataframe. Keep in mind it is the combination of the Category and Subcategory in skudata that need to match the combination of Category and Subcategory in cat_master in order to be considered a valid row.

Here's what I have in terms of setup but need help in doing the actual "selection" of invalid rows in skudata.
import pandas as pd
skudata = pd.read_csv("S&OP SKU Data.csv")
cat_master = pd.read_csv("Valid Categories & Subcategories")
What do I need to do now in order to select and display only the rows in skudata where the category & subcategory combo does not exist in cat_master?

thank you!
Yoriz write Dec-05-2022, 06:15 PM:
Please post all code, output and errors (in their entirety) between their respective tags. Refer to BBCode help topic on how to post. Use the "Preview Post" button to make sure the code is presented as you expect before hitting the "Post Reply/Thread" button.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Add NER output to pandas dataframe dg3000 0 88 Apr-22-2024, 08:14 PM
Last Post: dg3000
  How to most effectively unpack list of name-value pair dictionaries in a dataframe? zlim 1 662 Nov-07-2023, 10:56 PM
Last Post: zlim
  How to add columns to polars dataframe sayyedkamran 1 1,795 Nov-03-2023, 03:01 PM
Last Post: gulshan212
  concat 3 columns of dataframe to one column flash77 2 852 Oct-03-2023, 09:29 PM
Last Post: flash77
  HTML Decoder pandas dataframe column mbrown009 3 1,055 Sep-29-2023, 05:56 PM
Last Post: deanhystad
  dataframe logic issues mbrown009 5 1,015 Sep-14-2023, 02:48 AM
Last Post: deanhystad
  attempt to split values from within a dataframe column mbrown009 8 2,375 Apr-10-2023, 02:06 AM
Last Post: mbrown009
  Use pandas to obtain cartesian product between a dataframe of int and equations? haihal 0 1,127 Jan-06-2023, 10:53 PM
Last Post: haihal
  How to insert data in a dataframe? man0s 1 1,333 Apr-26-2022, 11:36 PM
Last Post: jefsummers
  Creating a Dataframe from Zenodo zip file with multiple CSVs about Spotify man0s 0 1,358 Apr-26-2022, 01:45 PM
Last Post: man0s

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020