Python Forum
How to speed up work with pandas index?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to speed up work with pandas index?
#1
Python 3.7.3, pandas 0.25.1

I wrote a program using the DataFrame of the pandas library. But the speed is tens (or even a hundred) times less, than using dict and tuple. Index built on the lightest data type- 8bit unsigned int ('u1'). When adding new data, I always sort the index (although without this, the performance is just as terribly low). Moreover, the file contains about 6,000,000 lines, and there are only 2 different clients (that is, rebuilding the index is very rare, and the number of lines in the DataFrame is very small).

Why?

#create dataframe
dtype=np.dtype([('day_begin','u4'), ('day_end','u4'), ('price_begin','f4'), ('price_end','f4'), ('Client','u1')])      
auxiliary_array = np.empty(0, dtype=dtype)       
periods_clients = pd.DataFrame(auxiliary_array)        
periods_clients.set_index(['Client'], inplace=True)
 
#fill dataframe from file
with open(path_file) as csv_file:
        reader = csv.reader(csv_file)
          
        fieldnames = ['Date', 'Client', 'Price']
        reader = csv.DictReader(csv_file, fieldnames=fieldnames, delimiter=';')
        for dict_str in reader:
            Client = dict_str['Client']
             
            if Client not in periods_clients.index:
                periods_clients.loc[Client] = [current_date, current_date, current_price, current_price]
                periods_tickers.sort_index(level=0, inplace=True)
            else:                  
                periods_clients.loc[Client].day_end = current_date
                periods_clients.loc[Client].price_end = current_price
Reply
#2
When I create an index, its type independently changes to uint64 (from uint8):
periods_clients.set_index(['Client'], inplace=True)
and when I add the first value:

periods_clients.loc[Client] = [current_date, current_date, current_price, current_price]

to float64. Client is equal 0, which type I tried change to: int, int8, int64, uint8, uint64. Why?
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Grouping in pandas/multi-index data frame Aleqsie 3 606 Jan-06-2024, 03:55 PM
Last Post: deanhystad
Smile How to further boost the data read write speed using pandas tjk9501 1 1,227 Nov-14-2022, 01:46 PM
Last Post: jefsummers
  [split] Getting Index Error - list index out of range krishna 2 2,565 Jan-09-2021, 08:29 AM
Last Post: buran
  Getting Index Error - list index out of range RahulSingh 2 6,100 Feb-03-2020, 07:17 AM
Last Post: RahulSingh
  pandas.read_sas with chunksize: IndexError list index out of range axelle 0 2,549 Jan-28-2020, 09:30 AM
Last Post: axelle
  Applying operation to a pandas multi index dataframe subgroup Nuovoq 1 2,615 Sep-04-2019, 10:04 PM
Last Post: Nuovoq
  pandas: can we look for the index of a string paul18fr 2 2,180 Jul-31-2019, 08:25 AM
Last Post: paul18fr
  How to get first and last row index of each unique names in pandas dataframe SriRajesh 1 4,447 Oct-13-2018, 07:04 AM
Last Post: perfringo
  Select in Multi Index Pandas diego_last 0 2,320 Aug-01-2018, 12:56 PM
Last Post: diego_last
  Using pandas, index error fyec 1 4,640 Aug-01-2018, 09:25 AM
Last Post: volcano63

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020