Ordinary Least Squares | minw∥Xw−y∥22 | Fits a linear model with coefficients to minimize the residual sum of squares between observed and predicted targets. Sensitive to outliers; not robust if features are correlated (multicollinearity). |
Ridge Regression | minw∥Xw−y∥22+α∥w∥22 | Adds L2 regularization to the model to address some of the problems of Ordinary Least Squares. More robust to multicollinearity; has a bias-variance trade-off controlled by α. |
Lasso Regression | minw2nsamples1∥Xw−y∥22+α∥w∥1 | Adds L1 regularization to enforce sparsity of the coefficient vector. Useful for feature selection; produces models with fewer coefficients. |
Elastic Net | minw2nsamples1∥Xw−y∥22+αρ∥w∥1+2α(1−ρ)∥w∥22 | Combines L1 and L2 regularization to control the complexity of the model with two parameters. Balances between Ridge and Lasso; useful when there are correlations among features. |
Logistic Regression | minw,c∑i=1nlog(1+exp(−yi(XiTw+c))) | Used for binary classification problems, estimates probabilities using a logistic function. Provides probabilistic interpretation for binary classification tasks. |
Polynomial Regression | Depends on the degree of the polynomial features created from X. | Extends linear models by adding polynomial terms, which allows fitting a broader range of data. Can fit non-linear patterns; beware of overfitting with high-degree polynomials. |
RidgeCV | Same as Ridge, with α optimized by CV. | Ridge regression with built-in cross-validation of the alpha parameter to determine the best regularization. Convenient for automating the choice of α. |
LassoCV | Same as Lasso, with α optimized by CV. | Lasso regression with built-in cross-validation for selecting the best value of α. Efficient for high-dimensional data; automates α selection. |