RKHS

cosmos 5th July 2019 at 6:18pm

Reproducing kernel Hilbert space

aka RKHS

VideoVideo2video3

Definition A Hilbert space with a condition: Evaluation linear functional at point x is continuous. Properties of RKHS and connections

video for being a reproducing kernel Hilbert space (he missplaced the x, and missed the for all f. See here for right definition. Written correctly in next lecture).

Reproducing kernel perspective

The above condition (Evaluation linear functional at point x is continuous) is equivalent to having a Reproducing kernel. That is, the Hilbert space has a kernel (that is a function from two copies of the input space to the reals), such that

  • the function you get from fixing one argument of the kernel must be a member of the Hilbert space we are considering
  • the kernel is reproducing, that is Evaluation is done by inner producting with the kernel function with one argument fixed. see Theorem.

Note: feature vectors (see below) and functions in the RKHS are in 1-to-1 correspondence (so they can be seen to be the same). They can thus both be seen as the elements of the Hilbert space that is the RKHS.

Positive definite kernel perspective

An RKHS can be constructed given a kernel, defined to be just a function with two arguments from an input space, to the Reals, with some properties (positive definitiness). See here and here

Feature maps perspective

A feature map is just a map from an input set to a Hilbert space!

Videonotes

Idea: you map inputs to an Inner product space feature space (vector of numbers, which could be a function..). This is the feature map. Then you can define functions on inputs by defining them to be vectors in this same space, and defining their evaluation at x to be this vector inner producted with ϕ(x)\phi(x). The kernel function just tells you the value of the feature map at a point, with the feature map at another point.

This means that: by just giving an feature map, we get an induced RKHS, where the functions are represented (in 1-to-1 correspondence) with linear combinations of feature vectors. The function is really defined by taking the inner product of the feature vector of x with the feature vector representing the function.

See here: The RKHS is taken by taking linear combinations of feature functions – see also here

A typical kernel is the Gaussian kernel, and in that case the feature map ϕ(a)\phi(a) is just a Gaussian centered around aa, namely the function Ae(xa)2Ae^{-(x-a)^2}.

Indeed the reproducing kernels can be seen as the inner products of a basis functions/vectors. So the basis functions can be seen as just the kernels, with one argument fixed. One can easily check that the inner product of two Gaussian functions (defined using the usual inner product for functions, which can be gotten by looking at functions as vectors..) is the Gaussian kernel, one can also check that this is a reproducing kernel.

Consider a weird example of the above explanation. We can map elements of R^2 to functions on R^2 which look like pyramids centered around the input point. The space of functions has a standard inner product. Then we define a new function on R^2 given by taking the inner product of the function associated with point x with the function associated with point y for any y. Then we define the inner product of two such functions to be the inner product of the original pyramids, which makes them work like evaluating functionals. Therefore, we have defined an RKHS (the space of functions over R^2 with the inner product defined as later), via an intermediate not-necessarily-RK HS.

Connection with Regularization

—> Nice. This whole class of RKHS turn out to be basically spaces where we bound a norm different from the L^2, and which basically can be interpreted as Regularization! (see video). The regularization term can be, for instance the norm of the derviative!

Norm of a function in the space is like a measure of "Complexity" of that function


Applied to Kernel methods

Examples

Space of functions in Dictionary learning

Band-limited functions, and some relaxations of it where kernel is gaussian or exponential.. These generalizations are just RKHS with translational invariance kernels, very common in signal analysis, etc. These are Sobolev spaces!

Splines are also a special case! noice

Properties of Kernels

Sums and products of kernels are still kernels