Skip to main content

Optimization using Gradient Descent in two variables

Gradient descent extends to functions of multiple variables by utilizing gradients instead of derivatives. The process involves iteratively moving in the direction opposite to the gradient of the function at the current point, to find a local minimum.

Conceptual Overview

  • Gradient Descent: An optimization algorithm used to minimize some function by iteratively moving towards the steepest descent, as defined by the negative of the gradient.
  • Gradient: A vector that represents both the magnitude and the direction of the highest rate of increase of a function. For a function f(x,y)f(x, y), the gradient is f=(fx,fy)\nabla f = \left(\frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}\right).

Algorithm for Two Variables

Given a function f(x,y)f(x, y), the goal of gradient descent is to find the minimum value of ff. The steps are as follows:

  1. Initialize: Start with an initial guess (x0,y0)(x_0, y_0) for the minimum.
  2. Compute Gradient: Calculate the gradient f(x,y)=(fx,fy)\nabla f(x, y) = \left(\frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}\right) at your current location (xk,yk)(x_k, y_k).
  3. Update Position: Move in the direction opposite to the gradient (since we're seeking a minimum) by a step size determined by the learning rate α\alpha. The update rule is: (xk+1,yk+1)=(xk,yk)αf(xk,yk)(x_{k+1}, y_{k+1}) = (x_k, y_k) - \alpha \nabla f(x_k, y_k)
  4. Repeat: Continue the process until the changes are sufficiently small, indicating convergence to a minimum.

Practical Example

For a function representing temperature in a room, f(x,y)=85190x2(x6)2y2(y6)2f(x, y) = 85 - \frac{1}{90}x^2(x - 6)^2 - y^2(y - 6)^2, gradient descent aims to find the coolest point in the room.

  • Starting with an initial guess, e.g., (0.5,0.6)(0.5, 0.6).
  • Calculate the gradient at this point.
  • Update the position according to the gradient and a chosen learning rate, e.g., α=0.05\alpha = 0.05.
  • Iteratively apply these steps to converge to a point close to the minimum temperature.

Mathematical Formulation

The mathematical representation of gradient descent for a function f(x,y)f(x, y) is summarized as:

(xk+1,yk+1)=(xk,yk)α(fx,fy)(xk,yk)(x_{k+1}, y_{k+1}) = (x_k, y_k) - \alpha \left( \frac{\partial f}{\partial x}, \frac{\partial f}{\partial y} \right)_{(x_k, y_k)}

where α\alpha is the learning rate, and (fx,fy)\left( \frac{\partial f}{\partial x}, \frac{\partial f}{\partial y} \right) is the gradient of ff at (xk,yk)(x_k, y_k).

Challenges and Considerations

  • Local vs. Global Minimum: Gradient descent may converge to a local minimum instead of the global minimum. Starting the algorithm from different initial points can mitigate this risk.
  • Learning Rate: The choice of learning rate α\alpha is crucial; too large a step can overshoot the minimum, while too small a step can result in a slow convergence.

Gradient descent for multivariate functions follows the same principles as in the single-variable case, but it uses vectors to navigate the multi-dimensional space towards the function's minimum.