Neural networks are overrated

When it comes to AI, neural networks are the first method that comes to mind. Despite their impressive performance on a number of applications, we want to argue that they are not necessarily a good general-purpose machine learning method.

Neural network basics

Neural networks are powerful computation devices. Their basic design is the following:

Basic design of a feed-forward neural network.

Each circle is a perceptron, and the perceptrons are organized in layers. What a perceptron does is:

  1. Take a number as input.
  2. Add a bias to it (a fixed number).
  3. Apply an activation function to the result (for instance tanh or sigmoid).
  4. Send the result either to the perceptrons of the next layer, or to the output if that was the last layer.

When a perceptron takes as input several numbers from a past layer, each of those numbers is multiplied by a weight (usually between -1 and 1) which characterizes the strength of the connection between those two perceptrons. The numbers are then added together and go through the steps 1-4 outlined above.

To sum up in a visual way, this is what a perceptron does:

The mathematical operations performed by a perceptron.

How many hidden layers?

A natural question when it comes to creating a neural network model is: how many hidden layers should be used, and with how many perceptrons each?

A common rule of thumb is that the number of perceptrons in a layer should be between the number in the previous layer and the number in the next one.

Regarding the number of hidden layers, author Jeff Heaton writes in his book Introduction to Neural Networks for Java:

Number of Hidden LayersResult
0Only capable of representing linear separable functions or decisions.
1Can approximate any function that contains a continuous mapping from one finite space to another.
2Can represent an arbitrary decision boundary to arbitrary accuracy with rational activation functions and can approximate any smooth mapping to any accuracy.

It can be seen that, with just a single hidden layer, any continuous function on a bounded interval can be represented, and that with two hidden layers any map can be represented. So it is never really necessary to have 3 or more hidden layers.

An embarrassing example

With everything that we have seen so far, neural networks seem like a very elegant method with promising computation capabilities. But how well do they perform?

To give a concrete example, consider the following function on the interval [0, 1]:

A nontrivial function in the interval [0,1].

This function was hand drawn and converted to numbers using WebPlotDigitizer, so it is a simple but nontrivial example.

What happens if we try to fit this function with a neural network regressor?

The following script trains a neural network with one hidden layer containing 100 perceptrons using the scikit-learn Python library:

import numpy as np
import pandas as pd
from sklearn.neural_network import MLPRegressor

df = pd.read_csv('data.csv')

X = np.array(df['x']).reshape(-1, 1)
y = df['y']

nn = MLPRegressor(random_state=1, max_iter=500, hidden_layer_sizes=(100, )).fit(X, y)

prediction = nn.predict(X)

And this is what the resulting model looks like:

Neural network model for our data. One hidden layer with 100 perceptrons.

Clearly, this is not a good fit.

What if we add one more hidden layer? For instance, (100, 50) instead of just one (100,) hidden layer like we did before:

nn = MLPRegressor(random_state=1, max_iter=500, hidden_layer_sizes=(100, 50)).fit(X, y)

This is the result:

Neural network model with two hidden layers: (100, 50).

No much improvement. Bear in mind that the model visualized above has tens of thousands of free parameters (weights and biases), but it still performed poorly.

Alternatives to neural networks

Now you might think that we have just picked a particularly hard example that will not be properly represented by any typical machine learning method. To show that this is not the case, consider the following alternative model, obtained through symbolic regression using the desktop software TuringBot:

Symbolic regression model for our data.

This model consists of the following simple formula:

def f(x):
    return log(0.0192917+x)*2.88451*x*x+0.797118

Despite the model being simple and not containing ten thousand parameters, it managed to represent our nontrivial function with great accuracy.


Our goal in this article was to question the notion that neural networks are “the” machine learning method, and that they possess some kind of magical machine learning capability that allows them to find hidden patterns everywhere.

It might be the case that for most of the typical applications of machine learning, neural networks might actually underperform simpler alternative methods.

Share this with your network:

Leave a Reply

Your email address will not be published. Required fields are marked *