Gradient and Multidimensional Optimization

Definition of Gradient

For a function $f(x, y)$ with two variables, the gradient, denoted as $\nabla f$ , is defined as the vector of its partial derivatives. Formally,

\nabla f = \left( \frac{\partial f}{\partial x}, \frac{\partial f}{\partial y} \right)

For $f(x, y) = x^2 + y^2$ , the gradient is $\nabla f = (2x, 2y)$ . This vector represents the direction and rate at which the function increases most rapidly. The components of $\nabla f$ correspond to the slopes of the tangent lines to the level curves of $f$ at any given point.

Gradient at a Specific Point

To find the gradient of $f(x, y) = x^2 + y^2$ at a particular point, say $(2, 3)$ , we substitute $x = 2$ and $y = 3$ into the gradient formula:

\nabla f = (2 \cdot 2, 2 \cdot 3) = (4, 6)

This result signifies the gradient of $f$ at the point $(2, 3)$ and represents the vector pointing in the direction of greatest increase of $f$ from that point.

Application in Optimization

The concept of the gradient extends naturally to the optimization of functions. For a function of a single variable, $f(x) = x^2$ , finding its minimum involves solving $f'(x) = 0$ . The solution indicates the point where the function has a horizontal tangent line, suggesting a local minimum or maximum.

For functions of two or more variables, such as $f(x, y) = x^2 + y^2$ , the optimization process seeks points where the gradient is zero, i.e., $\nabla f = \vec{0}$ . This condition implies that both partial derivatives, $\frac{\partial f}{\partial x}$ and $\frac{\partial f}{\partial y}$ , are zero. Solving the system of equations

\begin{align*} \frac{\partial f}{\partial x} = 2x &= 0 \\ \frac{\partial f}{\partial y} = 2y &= 0 \end{align*}

yields the point $(0, 0)$ as the location of the minimum. This methodology generalizes to functions of any number of variables, where the gradient being zero indicates a local extremum.

In summary, the gradient is a powerful tool in calculus and analysis, facilitating the understanding and optimization of multivariable functions. Its computation and the conditions for optimization form the basis for many applications in mathematics, physics, engineering, and machine learning.

Optimization in a Two-Dimensional Sauna

Considering a sauna as a two-dimensional space allows for movement in any direction within a 5x5 room, aiming to find the coldest point based on the temperature distribution.

Temperature Function: For a given position $(x, y)$ , the temperature is a function $T(x, y)$ represented by the height in a three-dimensional plot. Hot areas are indicated by red (higher values) and cold areas by blue (lower values).
Objective: To locate the coldest area ( $T(x, y)$ minimum) where moving in any direction increases the temperature.
Mathematical Approach: The process involves calculating the partial derivatives $\frac{\partial T}{\partial x}$ and $\frac{\partial T}{\partial y}$ , setting them to zero, and solving for $x$ and $y$ to find potential minimum points.
Example Function: $T(x, y) = 85 - \frac{1}{90}x^2(x - 6)y^2(y - 6)$ .
Partial Derivatives:
$\frac{\partial f}{\partial x} = -\frac{1}{90}x(3x - 12)y^2(y - 6)$ $\frac{\partial f}{\partial y} = -\frac{1}{90}x^2(x - 6)y(3y - 12)$
Finding the Minimum: Solving $\frac{\partial f}{\partial x} = 0$ and $\frac{\partial f}{\partial y} = 0$ yields several candidate points. The coldest point within the sauna's bounds is identified by evaluating these candidates.

Linear Regression Optimization

Linear regression, a fundamental machine learning model, is optimized through a similar multidimensional calculus approach but involves finding the best fit line to a set of data points.

Problem Statement: With given coordinates of power lines, the task is to minimize the total cost of connecting these to a main fiber line. The cost is proportional to the square of the distances from the power lines to the fiber line.
Mathematical Formulation: The line equation $y = mx + b$ represents the fiber line, with $m$ and $b$ being the slope and y-intercept, respectively. The optimization goal is to minimize the total cost function $E(m, b)$ , which depends on $m$ and $b$ .
Cost Function: $E(m, b) = 14m^2 + 3b^2 + 38 + 12mb - 42m - 20b$ .
Partial Derivatives and Optimization:
- $\frac{\partial E}{\partial m} = 28m + 12b - 42$
- $\frac{\partial E}{\partial b} = 6b + 12m - 20$
Solution: Setting $\frac{\partial E}{\partial m} = 0$ and $\frac{\partial E}{\partial b} = 0$ and solving for $m$ and $b$ yields the optimal line parameters minimizing the cost.
Result: The optimal values $m = \frac{1}{2}$ and $b = \frac{7}{3}$ are found, with a minimum cost of 4.167.

Gradient Descent: An Efficient Optimization Method

The traditional approach of solving partial derivatives can be cumbersome, especially with an increasing number of variables. Gradient descent offers a more efficient way to find minimum values of multidimensional functions by iteratively moving towards the steepest descent direction. This method is pivotal in machine learning for optimizing complex models beyond linear regression.

Definition of Gradient​

Gradient at a Specific Point​

Application in Optimization​

Optimization in a Two-Dimensional Sauna​

Linear Regression Optimization​

Gradient Descent: An Efficient Optimization Method​