# Machine Learning – Linear Regression (Part 2)

Continuing from where we left last post, we were discussing the batch gradient algorithm. The iterative formula is
$\theta_{i} = \theta_{i} - \frac{\alpha}{m}\sum_{i = 1}^{m}(\theta'X^i - y^i)*X_{i}$
The most obvious way to do this in normal Python would be to run a double nested loop, one from 1 to m to sum up the partial derivative term, and the other from 1 to n. to update the $\theta_{i}$ term.
Something like this.

theta_init = np.zeros(n + 1)
for i in xrange(n):
sum = 0
for j in xrange(m - 1):
sum += (np.dot(theta_init, X[j]) - y[j])*(X[i][j])
theta_init[i] -= (1/m)*sum


This loop has to be repeates till theta converges to a certain tolerance or a certain number of iterations. So basically there are three nested for loops.

However, NumPy makes things easier. Lets look at the vectorised way of doing this.
Consider theta to be a matrix of size (n + 1, 1)
X is a matrix of size (m, n + 1)
y is a matrix of size (n + 1, 1)
The inner product $\theta'X^i$ is computed as

theta.T * X[i]


Instead of doing it individually for each example, we could do it for all vectors and store it in a matrix of size (m+1, 1)

np.dot(theta.T, X.T) - y


This gives a matrix like this.
$[\theta_0*X_0^1 + \theta_1*X_1^1 + .. \theta_n*X_n^1 - y^1$
$\theta_0*X_0^2 + \theta_1*X_1^2 + .. \theta_n*X_n^2 - y^2 ...$
$\theta_0*X_0^m + \theta_1*X_1^m + .. \theta_n*X_n^1m - y^m]$
Lets call this Error matrix where each element is the error in predicting the output of the corresponding example.
$[E1 E2 E3 ... Em]$
Now each error term $E_{j}$ \$has to be multiplied with $X_i^j$ to get $(\theta'X - y^i)*X_{i}$. Instead of doing this.
1] Simply multiply the error matrices, with $X$ to get the above mentioned term. How does this work (Figure it out yourself)?
2] Then divide by $\frac{\alpha}{m}$ and subtract it from $theta_{i}$, to get the updated value of $theta_{i}$
3] Repeat till convergence.

Sometimes when you have a large number of examples then $\frac{\alpha}{m}\sum_{i = 1}^{m}(\theta'X^i - y^i)*X_{i}$ becomes computationally expensive, In that case you replace it with this, $\frac{\alpha}{m}(\theta'X^i - y^i)*X_{i}$