# Python Forum

Full Version: Estimating standard deviation from DataSet
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
SO I am having a hard time trying to calculate the standard deviation given the graph. I was wondering what the steps were? Here is a code of what I have so far, but is not getting the right output. the DataSet contains thousands of random numbers such as:
9.8254457980e-1 1.0293906530e+0 8.6314178340e-1 8.7754757930e-1 8.2216021950e-1 9.8155318390e-1 1.0215753050e+0 1.0064994180e+0 1.0300426240e+0 8.7195144970e-1 9.4140464140e-1 1.0811751280e+0 8.5982980390e-1

#Worksheet 1.3A - make plot of DataSet vs row Nums
from numpy import *
from matplotlib.pyplot import *
import matplotlib.pyplot as plt

--------------------------------------------------------------

# Import data as a list of numbers
with open("DataSet1.dat", "r") as textFile:
data = textFile.read().split() # split based on spaces
data = [float(point) for point in data] # convert strings to floats
rms = 0
#This is my code for calculating the square of the deviation
sqrt_rms = square(rms)
variance = average([square(i) for i in textFile]) - average1**2
print("The square of the standard deviation is:", variance)
print("The standard deviation is:", sqrt(variance))
#^Would square root of variance calculate the standard deviation?

plt.xlabel('x axis')
plt.ylabel('y axis')
plt.title('Plot of DataSet vs Row Numbers')
plt.plot(data)
plt.show()
Please post all code, output and errors (it it's entirety) between their respective tags. Refer to BBCode help topic on how to post. Use the "Preview Post" button to make sure the code is presented as you expect before hitting the "Post Reply/Thread" button.

Also do you have more information on the data set?
Is there an assignment sheet, or data specification?
The data doesn't really look random from the small sample that you are showing.
Most of the data points are between 8 e-1 and 9 e-1 with some 'noise' as low readings.
You use the term row in your code, so I am guessing that the data set is an array, correct?

I'm thinking you may need something like numpy.std: https://docs.scipy.org/doc/numpy-1.13.0/...y.std.html
Here is the code, apologize for the delay. Yes it is also a bunch of noise where the problem asks;

Make a plot of your data in DataSet1 vs. row number. From your plot, estimate the standard deviation of your data, i.e. how far the points scatter from the average and record your value. Now write a program to calculate the standard deviation (don’t use built in functions this time). Discuss how well your estimate matches your calculated value. Every time we make a measurement, a value of the noise gets determined at random. This is a bit like quantum mechanics, where when we make a measurement of, say, position, the value is determined randomly. In quantum mechanics the probability of measuring diﬀerent positions is drawn from a probability distribution given by the complex square of the wave function, P(x) = |Ψ|^2, so that the probability of measuring a position between x1 and x2 is given by the area under the P(x) curve between x1 and x2

```#Worksheet 1.3A - make plot of DataSet vs row Nums
from numpy import *
from matplotlib.pyplot import *
import matplotlib.pyplot as plt

variance = 0.0
standDev = 0.0
summ = 0
sum_sq = 0
average = 0

textFile = open('DataSet1.dat','r')
dataSet = [float(i) for i in file]

for i in range(0, len(dataSet)):