Python Forum

Full Version: nanmean - np - np.mean - errror
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I have fairly simple code to calculate mean. Basically I am reading mnist data, find the PCA and finding the mean of PCA1 and PCA2. The first few columns of PCA1, PCA2 are shown below. I am trying to find the mean of that data. I am confused how the mean could be something like e -power- negative 14.  Something is going wrong. I tried plain np.mean with axis=0, then nanmean - still the mean looks way off. If I chop off the last row it seemed to work at times. That is not consistent. The mean seems to be way out of range. The minimum is something like -1550 and max is like 1400 and for column 2 - 1300 to +2200. The mean could not be that small like e to power -14. 

Please help.
def findMean(X):
    X = X.astype('float64')
    print("from inside findMean print two rows of mean" )

    return badMU
Here is the output - excerpt only.
Print columns 1 and 2 of 5 rows of P i.e., P[:5,0:2] [[ -177.01262488 -1123.38369215]  [ -486.95502974   953.72782595]  [ -701.57667656   722.54538116]  [ -739.68606132   625.34125527]  [  496.35283147  -371.99788225]] Recovering X Test overall mean of P - just print a few cols from inside findMean print two rows of mean [ -3.54114684e-14  -1.08559058e-14] get min and max for col0 and col1 from P Num entries in P is = 12593
Not sure if I understand, but the "mean" is nothing more than the sum of a list of numbers divided by the number of items in the list.  So for example, taking your column 1 numbers we can use something simple like:

col_1 = [-177.01262488, -486.95502974, -701.57667656, -739.68606132, 496.35283147]
total = 0.0

# Column 1 mean
for n in col_1:
    total += n
print("Total = ", total)
mean_1 = total / len(col_1)
print("Mean of column 1 = ", mean_1)
and end up with:

Total =  -1608.8775610300002 Mean of column 1 =  -321.77551220600003
The same could be done for column 2. Not as fancy as numpy, but it will at least give you something to compare outputs.
Thanks for the response. Yes. That is exactly right. There are about 13,000+ rows. I have to use numpy - I do use code similar to yours - for cross check. I am learning numpy, scipy for machine learning using python and hence my use of the same. Trying to see if I am using numpy.mean wrong - or there are some additional parameters that I dont know about. I have lots of other programming experience, but specifically python.

One other thing. I do use the same method find_mean to find the mean of other matrices with more than 700 columns. I could try an inner loop for each column, but that would not be efficient for anything more than a few columns more than 5 or 10.

Works fine. It was due to my incorrect inputs to PCA finding method. The PCAs do give a mean of zero over the whole dataset you give it. You have to select the correct label to get the two different PCA combos. Understood what I was doing wrong.
glad to hear of your success  Smile and how you resolved it, not enough ppl do that.