Python Forum
I want to vectorize a really long for loop to improve performance
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
I want to vectorize a really long for loop to improve performance
#1
Question 
I have a bunch of mouse nerual spike data. The neurons activate when the mouse is in certain locations, from which I calculate the average firing rate of each neuron in each position along a 2 meter track (saved as a matrix in the variable "placefields", which contains for every neuron an array of 100 values of firing rates of each neuron in every 2cm position bin). Then, by analysing the neural spikes and taking into account the placefields, I can estimate the probability of where the mouse is along the 2m track. For every time period of 10 miliseconds, an array of probabilities is created, one probability for every 2cm of the track. So for instance, the probability that in the first 10ms the mouse is somewhere between 14 and 16 cm would be something like 0.00005. This is done through the following function:

def compute_probability(data,session_number,positionbinsize = 2):
    spikes = np.array(data[session_number]["spiketrain"])
    spikes = np.swapaxes(spikes, 0, 1)    #array of spikes, where every number is the amount of spikes every 10ms, like [[0,0,0,1,0,0,2,0,0,0...],[0,1,0,0...]]
    #every new list in the spikes array is neural data of another neuron. There are around 20
    placefields = full_session_curves(data, session_number,positionbinsize)
    
    xybins = len(placefields[0]) #number of position bins (100)
    nTimebins = len(spikes[0]) #number of time bins (length of recording session)

    probability = np.zeros((nTimebins, xybins)) #array of zeros that will be replaced by probabilties

    # The following loop is what takes a LONG time (60 seconds!), and needs to be vectorized
    # It takes in the spikes of all the neurons that occured in every time bin (spikes[:,i]) and placefields of every neuron
    # And uses Poisson probabiltiy distribution to estimate probabilties. 
    # It does this for every single time bin, and then puts that vector of probabilties into a row of the probability matrix
    for i in range(nTimebins):
        nspikes = np.tile(spikes[:,i],(xybins,1))
        nspikes = np.swapaxes(nspikes,0,1)
        maxL = stats.poisson.pmf(nspikes,placefields)
        maxL = maxL.prod(axis=0)
        probability[i,:] = maxL
     return probability
Unfortunately I am so used to for loops that I can't seem to wrap my head around vectorising, but the post doc who I am working with said that if I vectorised this It would be faster. I see how if I didn't iterate through every single 10 ms time bin in the session (which is like 200,000+) and instead calculated everything at once, it would be mroe effecient, but I have no clue how. I am asking for help with vectorising that loop, or just general advice how I would go about making it calculate all teh probabiltiies at once.
Reply
#2
My advice depends largely on how many times you plan to run this. If once, it will take you more time to learn and rewrite the code than it will to just let this run for the 33 minutes in the estimate you quote. However, if this is to be done multiple times I would use Pandas. Not a bad idea to learn Pandas anyway, lots of Youtubes and other sources on the net. Pandas is not only fast but also opens a range of functions to make things (relatively) easy.

Pandas supports an object called a Dataframe. Think of this as a spreadsheet or table where you can then reference entire columns by name. Won't eliminate the need for For loops but close.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  ValueError: Buffer dtype mismatch, expected 'Python object' but got 'long long' MaJeFi 2 12,397 Mar-20-2019, 06:00 AM
Last Post: MaJeFi

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020