Machine Learning – Linear Regression (Part 2)

Continuing from where we left last post, we were discussing the batch gradient algorithm. The iterative formula is
\theta_{i} = \theta_{i} - \frac{\alpha}{m}\sum_{i = 1}^{m}(\theta'X^i - y^i)*X_{i}
The most obvious way to do this in normal Python would be to run a double nested loop, one from 1 to m to sum up the partial derivative term, and the other from 1 to n. to update the \theta_{i} term.
Something like this.

theta_init = np.zeros(n + 1)
for i in xrange(n):
    sum = 0
    for j in xrange(m - 1):
        sum += (, X[j]) - y[j])*(X[i][j])
    theta_init[i] -= (1/m)*sum

This loop has to be repeates till theta converges to a certain tolerance or a certain number of iterations. So basically there are three nested for loops.

However, NumPy makes things easier. Lets look at the vectorised way of doing this.
Consider theta to be a matrix of size (n + 1, 1)
X is a matrix of size (m, n + 1)
y is a matrix of size (n + 1, 1)
The inner product \theta'X^i is computed as

theta.T * X[i]

Instead of doing it individually for each example, we could do it for all vectors and store it in a matrix of size (m+1, 1), X.T) - y

This gives a matrix like this.
[\theta_0*X_0^1 + \theta_1*X_1^1 + .. \theta_n*X_n^1 - y^1
\theta_0*X_0^2 + \theta_1*X_1^2 + .. \theta_n*X_n^2 - y^2   ...
\theta_0*X_0^m + \theta_1*X_1^m + .. \theta_n*X_n^1m - y^m]
Lets call this Error matrix where each element is the error in predicting the output of the corresponding example.
[E1  E2  E3 ...  Em]
Now each error term E_{j} $has to be multiplied with X_i^j to get (\theta'X - y^i)*X_{i}. Instead of doing this.
1] Simply multiply the error matrices, with X to get the above mentioned term. How does this work (Figure it out yourself)?
2] Then divide by \frac{\alpha}{m} and subtract it from theta_{i}, to get the updated value of theta_{i}
3] Repeat till convergence.

Stochastic gradient descent
Sometimes when you have a large number of examples then \frac{\alpha}{m}\sum_{i = 1}^{m}(\theta'X^i - y^i)*X_{i} becomes computationally expensive, In that case you replace it with this, \frac{\alpha}{m}(\theta'X^i - y^i)*X_{i}
so that theta is updated for each example, (i.e run the loop m times).

That’s it for now. (I tried implementing it on my own, and can be seen here,


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: