nanmean - np - np.mean - errror - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: nanmean - np - np.mean - errror (/thread-1030.html) |
nanmean - np - np.mean - errror - venkim_python - Nov-28-2016 I have fairly simple code to calculate mean. Basically I am reading mnist data, find the PCA and finding the mean of PCA1 and PCA2. The first few columns of PCA1, PCA2 are shown below. I am trying to find the mean of that data. I am confused how the mean could be something like e -power- negative 14. Something is going wrong. I tried plain np.mean with axis=0, then nanmean - still the mean looks way off. If I chop off the last row it seemed to work at times. That is not consistent. The mean seems to be way out of range. The minimum is something like -1550 and max is like 1400 and for column 2 - 1300 to +2200. The mean could not be that small like e to power -14. Please help. def findMean(X): X = X.astype('float64') badMU=np.nanmean(X,axis=0); print("from inside findMean print two rows of mean" ) print(badMU[0:2]) return badMUHere is the output - excerpt only.
RE: nanmean - np - np.mean - errror - sparkz_alot - Nov-28-2016 Not sure if I understand, but the "mean" is nothing more than the sum of a list of numbers divided by the number of items in the list. So for example, taking your column 1 numbers we can use something simple like: col_1 = [-177.01262488, -486.95502974, -701.57667656, -739.68606132, 496.35283147] total = 0.0 # Column 1 mean for n in col_1: total += n print("Total = ", total) mean_1 = total / len(col_1) print("Mean of column 1 = ", mean_1)and end up with: The same could be done for column 2. Not as fancy as numpy, but it will at least give you something to compare outputs.
RE: nanmean - np - np.mean - errror - venkim_python - Nov-28-2016 Sparkz_alot, Thanks for the response. Yes. That is exactly right. There are about 13,000+ rows. I have to use numpy - I do use code similar to yours - for cross check. I am learning numpy, scipy for machine learning using python and hence my use of the same. Trying to see if I am using numpy.mean wrong - or there are some additional parameters that I dont know about. I have lots of other programming experience, but specifically python. One other thing. I do use the same method find_mean to find the mean of other matrices with more than 700 columns. I could try an inner loop for each column, but that would not be efficient for anything more than a few columns more than 5 or 10. Thanks RE: nanmean - np - np.mean - errror - venkim_python - Nov-30-2016 Works fine. It was due to my incorrect inputs to PCA finding method. The PCAs do give a mean of zero over the whole dataset you give it. You have to select the correct label to get the two different PCA combos. Understood what I was doing wrong. RE: nanmean - np - np.mean - errror - sparkz_alot - Nov-30-2016 glad to hear of your success and how you resolved it, not enough ppl do that. |