I am trying to wrap my head around methods that use random selection of features and then fit to data solving a linear optimization problem.

I’ve seen a bunch of names of methods in the literature that look essentially the same. There are methods like random vector functional link which use a neural network type architecture. Similar methods were done for radial basis functions. Extreme learning machines are basically the same as these and took their ideas.

Then there are random feature methods. This branch of literature seems to begin with Rahimi and Recht. They seem to come from kernel methods and random Fourier series. Should I think of random features as an umbrella term over all the kernel setups including single layer neural network and radial basis functions?