Python Forum
nanmean - np - np.mean - errror
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
nanmean - np - np.mean - errror
#1
I have fairly simple code to calculate mean. Basically I am reading mnist data, find the PCA and finding the mean of PCA1 and PCA2. The first few columns of PCA1, PCA2 are shown below. I am trying to find the mean of that data. I am confused how the mean could be something like e -power- negative 14.  Something is going wrong. I tried plain np.mean with axis=0, then nanmean - still the mean looks way off. If I chop off the last row it seemed to work at times. That is not consistent. The mean seems to be way out of range. The minimum is something like -1550 and max is like 1400 and for column 2 - 1300 to +2200. The mean could not be that small like e to power -14. 

Please help.
def findMean(X):
    X = X.astype('float64')
    badMU=np.nanmean(X,axis=0);
    print("from inside findMean print two rows of mean" )
    print(badMU[0:2])

    return badMU
Here is the output - excerpt only.
Output:
Print columns 1 and 2 of 5 rows of P i.e., P[:5,0:2] [[ -177.01262488 -1123.38369215]  [ -486.95502974   953.72782595]  [ -701.57667656   722.54538116]  [ -739.68606132   625.34125527]  [  496.35283147  -371.99788225]] Recovering X Test overall mean of P - just print a few cols from inside findMean print two rows of mean [ -3.54114684e-14  -1.08559058e-14] get min and max for col0 and col1 from P Num entries in P is = 12593
Reply
#2
Not sure if I understand, but the "mean" is nothing more than the sum of a list of numbers divided by the number of items in the list.  So for example, taking your column 1 numbers we can use something simple like:

col_1 = [-177.01262488, -486.95502974, -701.57667656, -739.68606132, 496.35283147]
total = 0.0

# Column 1 mean
for n in col_1:
    total += n
print("Total = ", total)
mean_1 = total / len(col_1)
print("Mean of column 1 = ", mean_1)
and end up with:

Output:
Total =  -1608.8775610300002 Mean of column 1 =  -321.77551220600003
The same could be done for column 2. Not as fancy as numpy, but it will at least give you something to compare outputs.
If it ain't broke, I just haven't gotten to it yet.
OS: Windows 10, openSuse 42.3, freeBSD 11, Raspian "Stretch"
Python 3.6.5, IDE: PyCharm 2018 Community Edition
Reply
#3
Sparkz_alot,
Thanks for the response. Yes. That is exactly right. There are about 13,000+ rows. I have to use numpy - I do use code similar to yours - for cross check. I am learning numpy, scipy for machine learning using python and hence my use of the same. Trying to see if I am using numpy.mean wrong - or there are some additional parameters that I dont know about. I have lots of other programming experience, but specifically python.

One other thing. I do use the same method find_mean to find the mean of other matrices with more than 700 columns. I could try an inner loop for each column, but that would not be efficient for anything more than a few columns more than 5 or 10.

Thanks
Reply
#4
Works fine. It was due to my incorrect inputs to PCA finding method. The PCAs do give a mean of zero over the whole dataset you give it. You have to select the correct label to get the two different PCA combos. Understood what I was doing wrong.
Reply
#5
glad to hear of your success  Smile and how you resolved it, not enough ppl do that.
If it ain't broke, I just haven't gotten to it yet.
OS: Windows 10, openSuse 42.3, freeBSD 11, Raspian "Stretch"
Python 3.6.5, IDE: PyCharm 2018 Community Edition
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  python indentation errror and output error jazzy 4 3,329 Sep-19-2018, 09:35 AM
Last Post: jazzy

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020