You Don’t Need to Learn All the Weights on tabular data: The Case for rvflnet (a nonlinear expressive glmnet) on regression, classification and survival analysis

Wait 5 sec.

[This article was first published on T. Moudiki's Webpage - R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.IntroductionRandom Vector Functional Link (RVFL) networks offer a simple yet powerful alternative to traditional neural networks for tabular data. Instead of learning hidden layers through backpropagation, RVFL generates them randomly (or not, if using a deterministic sequence of quasi-random numbers) and focuses all learning effort on a final, regularized linear model.Formally, let\[X \in \mathbb{R}^{n \times p}\]be the input data. RVFL networks (the ones described in this blog post) construct a set of nonlinear features by projecting (X) onto a random matrix\[W \in \mathbb{R}^{p \times m},\]and applying an activation function (\(g(\cdot)\)):\[H = g\left( \frac{X – \mu}{\sigma} ; W \right).\]These random nonlinear features are then concatenated with the original inputs to form an augmented design matrix:\[Z = [X | H].\]The model prediction is obtained by fitting a linear model on this expanded space (hence, a nonlinear GLM):\[\hat{y} = Z \beta.\]Because (Z) can be high-dimensional and highly redundant, RVFL networks (the ones described in this blog post) rely on Elastic Net regularization (glmnet) to estimate the coefficients:\[\hat{\beta} = \arg\min_{\beta}\mathcal{L}(y, Z\beta) + \lambda \left(\alpha ||\beta||_1 + (1-\alpha)||\beta||_2^2\right).\]In this framework, randomness creates a rich pool of nonlinear transformations, while regularization selects and stabilizes the most useful ones. The result is a nonlinear model that combines the flexibility of neural networks with the efficiency and robustness of linear methods.Of course, this blog post is not a proof of the title. It’s about R package rvflnet. But you can appreciate the high performance of RVFLs on regression, classification and survival analysis, an notably on the controversial Boston dataset (performs on par with Random Forest or Gradient Boosting).0 – Install packageinstall.packages("survival", repos = "https://cran.r-project.org") # survival analysisinstall.packages("remotes", repos = "https://cran.r-project.org")devtools::install_github('thierrymoudiki/rvflnet') # Nonlinear glm (RVFL networks)1 – Regressionset.seed(123)library(glmnet)data(Boston, package = "MASS")# -------------------------# Data# -------------------------X