Python Forum

Full Version: nanmean - np - np.mean - errror
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I have fairly simple code to calculate mean. Basically I am reading mnist data, find the PCA and finding the mean of PCA1 and PCA2. The first few columns of PCA1, PCA2 are shown below. I am trying to find the mean of that data. I am confused how the mean could be something like e -power- negative 14.  Something is going wrong. I tried plain np.mean with axis=0, then nanmean - still the mean looks way off. If I chop off the last row it seemed to work at times. That is not consistent. The mean seems to be way out of range. The minimum is something like -1550 and max is like 1400 and for column 2 - 1300 to +2200. The mean could not be that small like e to power -14. 

Please help.
def findMean(X):
    X = X.astype('float64')
    badMU=np.nanmean(X,axis=0);
    print("from inside findMean print two rows of mean" )
    print(badMU[0:2])

    return badMU
Here is the output - excerpt only.
Output:
Print columns 1 and 2 of 5 rows of P i.e., P[:5,0:2] [[ -177.01262488 -1123.38369215]  [ -486.95502974   953.72782595]  [ -701.57667656   722.54538116]  [ -739.68606132   625.34125527]  [  496.35283147  -371.99788225]] Recovering X Test overall mean of P - just print a few cols from inside findMean print two rows of mean [ -3.54114684e-14  -1.08559058e-14] get min and max for col0 and col1 from P Num entries in P is = 12593
Not sure if I understand, but the "mean" is nothing more than the sum of a list of numbers divided by the number of items in the list.  So for example, taking your column 1 numbers we can use something simple like:

col_1 = [-177.01262488, -486.95502974, -701.57667656, -739.68606132, 496.35283147]
total = 0.0

# Column 1 mean
for n in col_1:
    total += n
print("Total = ", total)
mean_1 = total / len(col_1)
print("Mean of column 1 = ", mean_1)
and end up with:

Output:
Total =  -1608.8775610300002 Mean of column 1 =  -321.77551220600003
The same could be done for column 2. Not as fancy as numpy, but it will at least give you something to compare outputs.
Sparkz_alot,
Thanks for the response. Yes. That is exactly right. There are about 13,000+ rows. I have to use numpy - I do use code similar to yours - for cross check. I am learning numpy, scipy for machine learning using python and hence my use of the same. Trying to see if I am using numpy.mean wrong - or there are some additional parameters that I dont know about. I have lots of other programming experience, but specifically python.

One other thing. I do use the same method find_mean to find the mean of other matrices with more than 700 columns. I could try an inner loop for each column, but that would not be efficient for anything more than a few columns more than 5 or 10.

Thanks
Works fine. It was due to my incorrect inputs to PCA finding method. The PCAs do give a mean of zero over the whole dataset you give it. You have to select the correct label to get the two different PCA combos. Understood what I was doing wrong.
glad to hear of your success  Smile and how you resolved it, not enough ppl do that.