Optimization using Gradient Descent - Least squares

Linear regression is a statistical method used to model the relationship between a dependent variable and one or more independent variables by fitting a linear equation to observed data. The simplest form of the linear equation with one dependent and one independent variable is represented as $y = mx + b$ , where $m$ is the slope of the line and $b$ is the y-intercept. This method is widely used in predictive modeling and quantitative forecasting.

Gradient Descent for Linear Regression

Gradient Descent is an optimization algorithm used to minimize some function by iteratively moving in the direction of steepest descent as defined by the negative of the gradient. In the context of linear regression, gradient descent is used to find the values of $m$ and $b$ that minimize the cost function, which is typically the sum of squared differences between the observed values and the values predicted by the model.

Cost Function

The cost function for linear regression, often referred to as the sum of squared errors, quantifies the difference between the observed values and the values predicted by the linear model. It is given by:

E(m, b) = \frac{1}{n} \sum_{i=1}^{n} (y_i - (mx_i + b))^2

where:

$n$ is the number of observations,
$y_i$ is the observed value,
$x_i$ is the independent variable,
$m$ is the slope, and
$b$ is the y-intercept.

Gradient Descent Algorithm

The gradient descent algorithm updates the parameters $m$ and $b$ iteratively to minimize the cost function $E(m, b)$ . The update rules for $m$ and $b$ at each iteration are:

m_{k+1} = m_k - \alpha \frac{\partial E}{\partial m}

b_{k+1} = b_k - \alpha \frac{\partial E}{\partial b}

where $\alpha$ is the learning rate, a hyperparameter that controls the size of the steps taken towards the minimum of the cost function.

Partial Derivatives of the Cost Function

The partial derivatives of $E(m, b)$ with respect to $m$ and $b$ are:

\frac{\partial E}{\partial m} = -\frac{2}{n} \sum_{i=1}^{n} x_i(y_i - (mx_i + b))

\frac{\partial E}{\partial b} = -\frac{2}{n} \sum_{i=1}^{n} (y_i - (mx_i + b))

These gradients are used to update the values of $m$ and $b$ iteratively in the direction that minimally decreases the cost function.

Implementation Steps

Initialization: Start with initial guesses for the values of $m$ and $b$ .
Gradient Calculation: Compute the gradients of the cost function with respect to $m$ and $b$ .
Update Parameters: Update the values of $m$ and $b$ using the gradient descent update rules.
Iteration: Repeat steps 2 and 3 until the cost function converges to a minimum value.

This process results in finding the values of $m$ and $b$ that best fit the linear regression model to the observed data, thereby solving the linear regression problem through gradient descent.

Gradient Descent for Linear Regression​

Cost Function​

Gradient Descent Algorithm​

Partial Derivatives of the Cost Function​

Implementation Steps​

Gradient Descent for Linear Regression

Cost Function

Gradient Descent Algorithm

Partial Derivatives of the Cost Function

Implementation Steps