A couple of days before, I discovered a hidden secret within the activation functions that empowers multilayer feedforward networks to reach the realm of universal approximation. In simpler terms, these activation functions possess an enchanting power, allowing neural networks to accurately mimic and represent any continuous function imaginable. Does it still sound otherworldly? Well, do not worry, I am here to guide you on this.
What is a universal approximator?
A universal approximator is a mathematical model that has the ability to approximate any arbitrary function to any desired level of accuracy. In our context, we are talking about a model (multilayer feed forward network) that can approximate any continuous function with arbitrary precision.
How to picture multilayer feedforward networks?
Picture an intricate web of interconnected neurons, inspired by the human brain. Each neuron receives input, processes it, and passes it to the next layer. Due to multiple layers, we refer to this as a multilayer network. Here feedforward means that each layer transmits data only in one direction, from input to output.
These multiple layers can serve as the elixir of power, granting multilayer feedforward networks the ability to grasp and unravel the most complex patterns in the given data. It is important to note that these complex patterns in real life data might not always be linearly separable. In many real-world scenarios, data patterns are highly non-linear, and using only linear models would not be sufficient to capture these complexities. There is why we need activation functions.
What are activation functions?
Activation functions introduce non-linearity in our network. To illustrate their importance, imagine a symphony playing with just one monotonous note, missing out on life's diverse melodies. That sounds boring, right?! In the same way, without activation functions, the whole neural network would behave like a single linear transformation, limiting its ability to unravel complex patterns in the data. By applying activation functions to the outputs of individual neurons in each layer, the network can model more intricate patterns that are not expressible by a linear model.
Well, now I can share the fascinating result by Leshno, Moshe and Lin, Valdimir Ya. and Pinkus, Allan and Schocken, Shimon, "Multilayer Feedforward Networks with a Non-Polynomial Activation Function Can Approximate Any Function (March 1992). NYU Working Paper No. IS-92-13".
Theorem: A standard multilayer feedforward network with locally bounded piecewise continuous activation function can approximate any continuous function to any degree of accuracy if and only if the networks activation function is not a polynomial.
Note that by locally bounded, we mean that in any finite interval, the activation function does not approach infinity of any sort. A piecewise continuous activation function is exactly what it sounds like, that is, it is composed of continuous segments. Let us take a look at the graphs of most commonly used activation functions, none of which are polynomials. These are all locally bounded, continuous and indeed are not polynomials.
So, what's the issue with polynomial activation functions? I believe that polynomials are quite consistent in their behaviour and it might be hard to approximate functions with sudden changes. If we increase the degree of polynomial to introduce oscillations, then the sudden changes are incorporated but it might result in a poorer overall approximation.
In conclusion, embrace the magic of non-polynomial activation functions, for they are the keys that open doors to new realms of discovery and innovation in the realm of AI.
Any comments? Let's connect.
Author:
Sadiah
Comments