Python Forum

I have fairly simple code to calculate mean. Basically I am reading mnist data, find the PCA and finding the mean of PCA1 and PCA2. The first few columns of PCA1, PCA2 are shown below. I am trying to find the mean of that data. I am confused how the mean could be something like e -power- negative 14. Something is going wrong. I tried plain np.mean with axis=0, then nanmean - still the mean looks way off. If I chop off the last row it seemed to work at times. That is not consistent. The mean seems to be way out of range. The minimum is something like -1550 and max is like 1400 and for column 2 - 1300 to +2200. The mean could not be that small like e to power -14.

Please help.

def findMean(X):
    X = X.astype('float64')
    badMU=np.nanmean(X,axis=0);
    print("from inside findMean print two rows of mean" )
    print(badMU[0:2])

    return badMU

Here is the output - excerpt only.

Output:Print columns 1 and 2 of 5 rows of P i.e., P[:5,0:2]
[[ -177.01262488 -1123.38369215]
 [ -486.95502974   953.72782595]
 [ -701.57667656   722.54538116]
 [ -739.68606132   625.34125527]
 [  496.35283147  -371.99788225]]
Recovering X
Test overall mean of P - just print a few cols
from inside findMean print two rows of mean
[ -3.54114684e-14  -1.08559058e-14]

get min and max for col0 and col1 from P
Num entries in P is = 12593

Not sure if I understand, but the "mean" is nothing more than the sum of a list of numbers divided by the number of items in the list. So for example, taking your column 1 numbers we can use something simple like:

col_1 = [-177.01262488, -486.95502974, -701.57667656, -739.68606132, 496.35283147]
total = 0.0

# Column 1 mean
for n in col_1:
    total += n
print("Total = ", total)
mean_1 = total / len(col_1)
print("Mean of column 1 = ", mean_1)

and end up with:

Output:Total =  -1608.8775610300002

Mean of column 1 =  -321.77551220600003

The same could be done for column 2. Not as fancy as numpy, but it will at least give you something to compare outputs.

Sparkz_alot,
Thanks for the response. Yes. That is exactly right. There are about 13,000+ rows. I have to use numpy - I do use code similar to yours - for cross check. I am learning numpy, scipy for machine learning using python and hence my use of the same. Trying to see if I am using numpy.mean wrong - or there are some additional parameters that I dont know about. I have lots of other programming experience, but specifically python.

One other thing. I do use the same method find_mean to find the mean of other matrices with more than 700 columns. I could try an inner loop for each column, but that would not be efficient for anything more than a few columns more than 5 or 10.

Thanks

Works fine. It was due to my incorrect inputs to PCA finding method. The PCAs do give a mean of zero over the whole dataset you give it. You have to select the correct label to get the two different PCA combos. Understood what I was doing wrong.

glad to hear of your success Smile

and how you resolved it, not enough ppl do that.

venkim_python

sparkz_alot

venkim_python

venkim_python

sparkz_alot