How to add data to the categorical index of dataframe as data arrives? - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: How to add data to the categorical index of dataframe as data arrives? (/thread-21611.html) |
How to add data to the categorical index of dataframe as data arrives? - AlekseyPython - Oct-07-2019 Python 3.7.3, pandas 0.25.1 I need to collect statistics within the clients, so I created a df in which for each client I group the necessary data: #create dataframe dtype=np.dtype([('day_begin','u4'), ('day_end','u4'), ('price_begin','f4'), ('price_end','f4'), ('Client','U13')]) auxiliary_array = np.empty(0, dtype=dtype) periods_clients = pd.DataFrame(auxiliary_array) periods_clients.set_index(['Client'], inplace=True) #fill dataframe from file with open(path_file) as csv_file: reader = csv.reader(csv_file, delimiter=';') fieldnames = ['Date', 'Client', 'Price'] reader = csv.DictReader(csv_file, fieldnames=fieldnames, delimiter=';') for dict_str in reader: Client = dict_str['Client'] if Client not in periods_clients.index: periods_clients.loc[Client] = [current_date, current_date, current_price, current_price] else: periods_clients.loc[Client].day_end = current_date periods_clients.loc[Client].price_end = current_priceThe client is a string field, so the program runs for a very long time. My attempt to replace this field with a categorical variable failed, because I could not add values while reading data from the file (and in advance I do not know all the clients). How to be, boys? if I create catregorical index, then I get error: #create dataframe dtype=np.dtype([('day_begin','u4'), ('day_end','u4'), ('price_begin','f4'), ('price_end','f4')]) auxiliary_array = np.empty(0, dtype=dtype) periods_clients = pd.DataFrame(auxiliary_array) periods_clients['Client'] = pd.Series('Client', dtype='category') periods_clients.set_index(['Client'], inplace=True) .... #fill dataframe from file ... if Client not in periods_clients.index: periods_clients.index.add_categories(Client, inplace=True) #ERROR!!! Therefore, I cann't add values to the categorical index as they become available.
RE: How to add data to the categorical index of dataframe as data arrives? - AlekseyPython - Oct-16-2019 Maybe there is some way? |