![]() |
Pandas dataframe comparing - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: Pandas dataframe comparing (/thread-36228.html) |
Pandas dataframe comparing - anto5 - Jan-30-2022 Hello, I am reading a csv file using pandas. my dataframe consist of 3.7 million records and has two column: Date, Subscribers_ID my dataframe data is the list of active subscribers per day. I want to check what subscribers_id exist in day X and does not exist in day X + 1 so i can have a list of the subscribers_ID that are not inactive in day X + 1. And i want to do that for each day of the existing days. is there any comparative function that do this or i have to create a new dataframe for each day and compare dataframes to each others. Because i have more than 75 days. here is a sample of my data and what i want as result: import pandas as pd data = {'date':['22-Jan-22', '22-Jan-22', '22-Jan-22', '22-Jan-22', '23-Jan-22', '23-Jan-22', '23-Jan-22', '23-Jan-22', '23-Jan-22', '23-Jan-22', '23-Jan-22', '24-Jan-22', '24-Jan-22', '24-Jan-22', '24-Jan-22', '24-Jan-22', '24-Jan-22'], 'Subscriber_ID':['a', 'b', 'c', 'd', 'e', 'f', 'b', 'c', 'd', 'h', 'g', 'c', 'd', 'h', 'j', 'i', 'k']} df = pd.DataFrame(data) print(df)I want to have the following result: Subscribers_ID lost in 23-Jan-22 is/are: a Subscribers_ID lost in 24-Jan-22 is/are: e, f, b, g |