Skip to main content

Optimization using Gradient Descent - Least squares

Linear regression is a statistical method used to model the relationship between a dependent variable and one or more independent variables by fitting a linear equation to observed data. The simplest form of the linear equation with one dependent and one independent variable is represented as y=mx+by = mx + b, where mm is the slope of the line and bb is the y-intercept. This method is widely used in predictive modeling and quantitative forecasting.

Gradient Descent for Linear Regression

Gradient Descent is an optimization algorithm used to minimize some function by iteratively moving in the direction of steepest descent as defined by the negative of the gradient. In the context of linear regression, gradient descent is used to find the values of mm and bb that minimize the cost function, which is typically the sum of squared differences between the observed values and the values predicted by the model.

Cost Function

The cost function for linear regression, often referred to as the sum of squared errors, quantifies the difference between the observed values and the values predicted by the linear model. It is given by:

E(m,b)=1ni=1n(yi(mxi+b))2E(m, b) = \frac{1}{n} \sum_{i=1}^{n} (y_i - (mx_i + b))^2

where:

  • nn is the number of observations,
  • yiy_i is the observed value,
  • xix_i is the independent variable,
  • mm is the slope, and
  • bb is the y-intercept.

Gradient Descent Algorithm

The gradient descent algorithm updates the parameters mm and bb iteratively to minimize the cost function E(m,b)E(m, b). The update rules for mm and bb at each iteration are:

mk+1=mkαEmm_{k+1} = m_k - \alpha \frac{\partial E}{\partial m} bk+1=bkαEbb_{k+1} = b_k - \alpha \frac{\partial E}{\partial b}

where α\alpha is the learning rate, a hyperparameter that controls the size of the steps taken towards the minimum of the cost function.

Partial Derivatives of the Cost Function

The partial derivatives of E(m,b)E(m, b) with respect to mm and bb are:

Em=2ni=1nxi(yi(mxi+b))\frac{\partial E}{\partial m} = -\frac{2}{n} \sum_{i=1}^{n} x_i(y_i - (mx_i + b)) Eb=2ni=1n(yi(mxi+b))\frac{\partial E}{\partial b} = -\frac{2}{n} \sum_{i=1}^{n} (y_i - (mx_i + b))

These gradients are used to update the values of mm and bb iteratively in the direction that minimally decreases the cost function.

Implementation Steps

  1. Initialization: Start with initial guesses for the values of mm and bb.
  2. Gradient Calculation: Compute the gradients of the cost function with respect to mm and bb.
  3. Update Parameters: Update the values of mm and bb using the gradient descent update rules.
  4. Iteration: Repeat steps 2 and 3 until the cost function converges to a minimum value.

This process results in finding the values of mm and bb that best fit the linear regression model to the observed data, thereby solving the linear regression problem through gradient descent.