7 docs tagged with "basic-models"

Attention Mechanism

Attention mechanisms address challenges in traditional neural network models like CNNs and RNNs, which require fixed input sizes. They offer a flexible approach to handling inputs of varying size and content, such as long text sequences. This flexibility is achieved through mechanisms that enable dynamic focus on different parts of the input.

Convolutional Neural Networks

Convolutional Neural Networks (CNNs) are specialized neural networks designed primarily for processing structured grid data such as images. CNNs leverage the inherent properties of data like spatial relationships and locality to reduce the complexity and computational cost associated with learning from high-dimensional data.

Linear Regression

Regression analysis is a statistical method used for predicting numerical values based on input features. Common applications include predicting home prices, stock values, patient hospital stays, and retail sales forecasts.

Multilayer Perceptron

Multilayer Perceptrons (MLPs) are a class of deep neural networks characterized by their layered structure, which includes an input layer, one or more hidden layers, and an output layer. Each layer comprises neurons that are fully connected to neurons in the subsequent layer through weighted connections.

Recurrent Neural Networks

Recurrent Neural Networks (RNNs) are a type of neural network designed for processing sequences by leveraging hidden states to capture temporal information. They are particularly well-suited for tasks like language modeling, where the goal is to predict the next token based on the historical sequence of previous tokens.

Softmax Regression

In previous sections, we explored linear regression and its implementations, both from scratch and using high-level APIs. Regression models are typically used for quantitative outputs such as predicting prices, number of wins, or the number of days a patient might stay in the hospital. However, not all problems are best served by regression models due to the nature of their outputs. This leads to special cases like logarithmic regression or survival modeling.

Transformers

The Transformer model, introduced by Vaswani et al. (2017), is a deep architecture solely based on attention mechanisms, omitting traditional convolutional or recurrent layers. It is designed for sequence-to-sequence learning and has been widely applied in language, vision, speech, and reinforcement learning applications. The architecture supports parallel computation and features a short path length between input and output, making it highly efficient for tasks involving sequential data.