Python Forum

Full Version: How to add data to the categorical index of dataframe as data arrives?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Python 3.7.3, pandas 0.25.1

I need to collect statistics within the clients, so I created a df in which for each client I group the necessary data:

#create dataframe
dtype=np.dtype([('day_begin','u4'), ('day_end','u4'), ('price_begin','f4'), ('price_end','f4'), ('Client','U13')])      
auxiliary_array = np.empty(0, dtype=dtype)       
periods_clients = pd.DataFrame(auxiliary_array)        
periods_clients.set_index(['Client'], inplace=True)

#fill dataframe from file
with open(path_file) as csv_file:
        reader = csv.reader(csv_file, delimiter=';')
         
        fieldnames = ['Date', 'Client', 'Price']
        reader = csv.DictReader(csv_file, fieldnames=fieldnames, delimiter=';')
        for dict_str in reader:
            Client = dict_str['Client']
            
            if Client not in periods_clients.index:
                periods_clients.loc[Client] = [current_date, current_date, current_price, current_price]
            else:                  
                periods_clients.loc[Client].day_end = current_date
                periods_clients.loc[Client].price_end = current_price
The client is a string field, so the program runs for a very long time. My attempt to replace this field with a categorical variable failed, because I could not add values ​​while reading data from the file (and in advance I do not know all the clients).

How to be, boys?

if I create catregorical index, then I get error:

#create dataframe
dtype=np.dtype([('day_begin','u4'), ('day_end','u4'), ('price_begin','f4'), ('price_end','f4')])
auxiliary_array = np.empty(0, dtype=dtype)
periods_clients = pd.DataFrame(auxiliary_array)

periods_clients['Client'] = pd.Series('Client', dtype='category')
periods_clients.set_index(['Client'], inplace=True)
....
#fill dataframe from file
...
if Client not in periods_clients.index:
    periods_clients.index.add_categories(Client, inplace=True) #ERROR!!!
Error:
ValueError: cannot use inplace with CategoricalIn
Therefore, I cann't add values ​​to the categorical index as they become available.
Maybe there is some way?