Python Forum

Hello,

I am reading a csv file using pandas.

my dataframe consist of 3.7 million records and has two column: Date, Subscribers_ID

my dataframe data is the list of active subscribers per day.

I want to check what subscribers_id exist in day X and does not exist in day X + 1 so i can have a list of the subscribers_ID that are not inactive in day X + 1. And i want to do that for each day of the existing days.

is there any comparative function that do this or i have to create a new dataframe for each day and compare dataframes to each others. Because i have more than 75 days.

here is a sample of my data and what i want as result:

import pandas as pd

data = {'date':['22-Jan-22', '22-Jan-22', '22-Jan-22', '22-Jan-22', '23-Jan-22', '23-Jan-22', '23-Jan-22', '23-Jan-22', '23-Jan-22', '23-Jan-22', '23-Jan-22', '24-Jan-22', '24-Jan-22', '24-Jan-22', '24-Jan-22', '24-Jan-22', '24-Jan-22'], 'Subscriber_ID':['a', 'b', 'c', 'd', 'e', 'f', 'b', 'c', 'd', 'h', 'g', 'c', 'd', 'h', 'j', 'i', 'k']}

df = pd.DataFrame(data)

print(df)

I want to have the following result:

Subscribers_ID lost in 23-Jan-22 is/are: a
Subscribers_ID lost in 24-Jan-22 is/are: e, f, b, g

anto5