Skip to main content

Classification with a Neural Network

Neural networks are advanced computational models that mimic the human brain's structure, enabling them to capture and model complex, non-linear relationships between inputs and outputs. They consist of layers of perceptrons (neurons) that process inputs through weighted connections.

Structure of Neural Networks

What is a Neural Network? | IBM

  • Input Layer: The first layer that receives the input.
  • Hidden Layers: One or more layers that process the inputs from the previous layer and pass the output to the next layer. Each neuron in a hidden layer transforms the inputs with a weighted sum followed by a non-linear activation function.
  • Output Layer: The final layer that produces the network's output.

Mathematical Model

Notations

  • wijw_{ij}: Weight from neuron ii in the previous layer to neuron jj in the current layer.
  • bib_i: Bias term for neuron ii.
  • ziz_i: Weighted sum of inputs for neuron ii.
  • σ(zi)\sigma(z_i): Activation function applied to ziz_i, often a sigmoid function.
  • aia_i: Output of neuron ii after applying the activation function.
  • L(y,y^)L(y, \hat{y}): Loss function measuring the prediction error, where yy is the actual target and y^\hat{y} is the predicted output.

Forward Propagation

  • Input to Hidden Layer: Calculates the weighted sum of inputs and applies the activation function to each neuron in the hidden layer.
  • Hidden to Output Layer: Processes the outputs from the hidden layer to produce the final prediction.

Backpropagation and Gradient Descent

Derivatives for Gradient Descent

The partial derivative of the loss function with respect to a weight wijw_{ij} is given by the chain rule:

Lwij=ziwijaizizaiy^zLy^\frac{\partial L}{\partial w_{ij}} = \frac{\partial z_{i}}{\partial w_{ij}} \cdot \frac{\partial a_{i}}{\partial z_{i}} \cdot \frac{\partial z}{\partial a_{i}} \cdot \frac{\partial \hat{y}}{\partial z} \cdot \frac{\partial L}{\partial \hat{y}}
  • ziwij\frac{\partial z_{i}}{\partial w_{ij}}: Derivative of the weighted sum with respect to the weight, typically the input for the corresponding weight.
  • aizi=σ(zi)(1σ(zi))\frac{\partial a_{i}}{\partial z_{i}} = \sigma(z_{i})(1 - \sigma(z_{i})): Derivative of the activation function.

The weights are updated using the rule:

wij:=wijαLwijw_{ij} := w_{ij} - \alpha \frac{\partial L}{\partial w_{ij}}

where α\alpha is the learning rate.

Training Multi-Layer Networks

For networks with multiple hidden layers, the notation includes superscripts to denote layer numbers (wij(l)w^{(l)}_{ij}, bi(l)b^{(l)}_i, etc.). The training process involves a forward pass to compute activations and a backward pass (backpropagation) to compute gradients and update weights and biases using gradient descent.

Loss Function and Derivatives

The loss function L(y,y^)L(y, \hat{y}) quantifies the difference between actual and predicted outputs. The gradient of the loss function with respect to weights and biases (LWij\frac{\partial L}{\partial W_{ij}}, Lbi\frac{\partial L}{\partial b_i}) guides the updates during training.

Chain Rule Application

The chain rule is applied to compute derivatives across multiple layers, enabling the calculation of how changes in weights and biases affect the overall loss.

Updating Weights and Biases

Weight and Bias Update Formula

  • For weights: Wij=WijαLWijW_{ij} = W_{ij} - \alpha \frac{\partial L}{\partial W_{ij}}
  • For biases: bi=biαLbib_i = b_i - \alpha \frac{\partial L}{\partial b_i}

Simplification for Biases

The update process for biases is simplified, as they directly affect the neuron's output without being multiplied by an input value.

Conclusion

Neural networks model complex patterns through their layered structure and nonlinear input transformations. Training involves adjusting weights and biases to minimize loss, a process facilitated by backpropagation and gradient descent.