Why is my gradient descent algorithm requiring such a small alpha?

Why is my gradient descent algorithm requiring such a small alpha? - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: Why is my gradient descent algorithm requiring such a small alpha? (/thread-6798.html)

Why is my gradient descent algorithm requiring such a small alpha? - JoeB - Dec-08-2017

I am trying to implement a gradient descent algorithm for linear regression. I am using the attached data. My algorithm is shown below:

import numpy as np 
import csv
import matplotlib.pyplot as plt
path = ''
with open(path + 'ex1data1.txt', 'r') as f:
    results = list(csv.reader(f))

population = []
profit = []
for i in results:
	population.append(float(i[0]))
	profit.append(float(i[1]))

def gradientDescent(xData, yData, a, iterations, theta0, theta1):
	J = []
	it = []
	for i in xrange(iterations):
		print i
		cost0 = 0
		cost1 = 0
		for j in xrange(len(xData)):

			cost0 += (theta0 + theta1*xData[j] - yData[j]) ** 2
			cost1 += (theta0 + theta1*xData[j] - yData[j]) ** 2 * xData[j]
		tmp0 = theta0 - a*cost0
		tmp1 = theta1 - a*cost1
		theta0 = tmp0
		theta1 = tmp1
		J.append(cost1)
		it.append(i)
	return theta0, theta1, J, it

result = gradientDescent(population, profit, 0.000000005, 8000, 1, 2)

print 'y = %s + %sx' % (result[0], result[1])

plt.plot(result[3], result[2])
plt.show()

x = np.arange(0, 30, 1)
y = result[0] + result[1]*x
plt.plot(x, y)
plt.plot(population, profit, 'rx')
plt.show()

Why is my gradient descent algorithm requiring such a small alpha? - JoeB - Dec-08-2017

I am trying to implement a gradient descent algorithm for linear regression. Unfortunately I can't attach the data set. My algorithm is shown below:

import numpy as np 
import csv
import matplotlib.pyplot as plt
path = ''
with open(path + 'ex1data1.txt', 'r') as f:
    results = list(csv.reader(f))

population = []
profit = []
for i in results:
	population.append(float(i[0]))
	profit.append(float(i[1]))

#This function implements gradient descent to find a regression line of
#the form y = theta0 + theta1*x
def gradientDescent(xData, yData, a, iterations, theta0, theta1):
	J = []
	it = []
	for i in xrange(iterations):
		#Begin calculation of cost function
		cost0 = 0
		cost1 = 0
		for j in xrange(len(xData)):
			cost0 += (theta0 + theta1*xData[j] - yData[j]) ** 2
			cost1 += (theta0 + theta1*xData[j] - yData[j]) ** 2 * xData[j]
        #End calculation of cost function
    
        #Update theta0 and theta1
		tmp0 = theta0 - a*cost0
		tmp1 = theta1 - a*cost1
		theta0 = tmp0
		theta1 = tmp1

		J.append(cost1)
		it.append(i)

	return theta0, theta1, J, it

result = gradientDescent(population, profit, 0.000000005, 8000, 1, 2)

print 'y = %s + %sx' % (result[0], result[1])

plt.plot(result[3], result[2])
plt.show()

x = np.arange(0, 30, 1)
y = result[0] + result[1]*x
plt.plot(x, y)
plt.plot(population, profit, 'rx')
plt.show()

I keep track of the cost function value at each iteration in order to plot it at the end to see get a good value of alpha. By changing the number of iterations it can be seen from the plot that the cost function only begins to converge at between 7000 and 8000 iterations. This is much higher than it should be.

Also, the value of alpha required is way too small. An appropriate value of alpha, such as 0.005 or 0.0005, doesn't work.

Does anyone have any ideas on how I can modify this? I can't see how this doesn't work as well as it should.