Neural networks are overrated

When it comes to AI, neural networks are the first method that comes to mind. Despite their impressive performance on a number of applications, we want to argue that they are not necessarily a good general-purpose machine learning method.

Neural network basics

Neural networks are powerful computation devices. Their basic design is the following:

Basic design of a feed-forward neural network.

Each circle is a perceptron, and the perceptrons are organized in layers. What a perceptron does is:

  1. Take a number as input.
  2. Add a bias to it (a fixed number).
  3. Apply an activation function to the result (for instance tanh or sigmoid).
  4. Send the result either to the perceptrons of the next layer, or to the output if that was the last layer.

When a perceptron takes as input several numbers from a past layer, each of those numbers is multiplied by a weight (usually between -1 and 1) which characterizes the strength of the connection between those two perceptrons. The numbers are then added together and go through the steps 1-4 outlined above.

To sum up in a visual way, this is what a perceptron does:

The mathematical operations performed by a perceptron.

How many hidden layers?

A natural question when it comes to creating a neural network model is: how many hidden layers should be used, and with how many perceptrons each?

A common rule of thumb is that the number of perceptrons in a layer should be between the number in the previous layer and the number in the next one.

Regarding the number of hidden layers, author Jeff Heaton writes in his book Introduction to Neural Networks for Java:

Number of Hidden LayersResult
0Only capable of representing linear separable functions or decisions.
1Can approximate any function that contains a continuous mapping from one finite space to another.
2Can represent an arbitrary decision boundary to arbitrary accuracy with rational activation functions and can approximate any smooth mapping to any accuracy.

It can be seen that, with just a single hidden layer, any continuous function on a bounded interval can be represented, and that with two hidden layers any map can be represented. So it is never really necessary to have 3 or more hidden layers.

An embarrassing example

With everything that we have seen so far, neural networks seem like a very elegant method with promising computation capabilities. But how well do they perform?

To give a concrete example, consider the following function on the interval [0, 1]:

A nontrivial function in the interval [0,1].

This function was hand drawn and converted to numbers using WebPlotDigitizer, so it is a simple but nontrivial example.

What happens if we try to fit this function with a neural network regressor?

The following script trains a neural network with one hidden layer containing 100 perceptrons using the scikit-learn Python library:

import numpy as np
import pandas as pd
from sklearn.neural_network import MLPRegressor

df = pd.read_csv('data.csv')

X = np.array(df['x']).reshape(-1, 1)
y = df['y']

nn = MLPRegressor(random_state=1, max_iter=500, hidden_layer_sizes=(100, )).fit(X, y)

prediction = nn.predict(X)

And this is what the resulting model looks like:

Neural network model for our data. One hidden layer with 100 perceptrons.

Clearly, this is not a good fit.

What if we add one more hidden layer? For instance, (100, 50) instead of just one (100,) hidden layer like we did before:

nn = MLPRegressor(random_state=1, max_iter=500, hidden_layer_sizes=(100, 50)).fit(X, y)

This is the result:

Neural network model with two hidden layers: (100, 50).

No much improvement. Bear in mind that the model visualized above has tens of thousands of free parameters (weights and biases), but it still performed poorly.

Alternatives to neural networks

Now you might think that we have just picked a particularly hard example that will not be properly represented by any typical machine learning method. To show that this is not the case, consider the following alternative model, obtained through symbolic regression using the desktop software TuringBot:

Symbolic regression model for our data.

This model consists of the following simple formula:

def f(x):
    return log(0.0192917+x)*2.88451*x*x+0.797118

Despite the model being simple and not containing ten thousand parameters, it managed to represent our nontrivial function with great accuracy.

Conclusion

Our goal in this article was to question the notion that neural networks are “the” machine learning method, and that they possess some kind of magical machine learning capability that allows them to find hidden patterns everywhere.

It might be the case that for most of the typical applications of machine learning, neural networks might actually underperform simpler alternative methods.

Share this with your network:

Machine learning black box models: some alternatives

In this article, we will discuss a very basic question regarding machine learning: is every model a black box? Certainly most methods seem to be, but as we will see, there are very interesting exceptions to this.

What is a black box method?

A method is said to be a black box when it performs complicated computations under the hood that cannot be clearly explained and understood. Data is fed into the model, internal transformations are performed on this data and an output is given, but these transformations are such that basic questions cannot be answered in a straightforward way:

  • Which of the input variables contributed the most to generating the output?
  • Exactly what features did the model derive from the input data?
  • How does the output change as a function of one of the variables?

Not only are black box models hard to understand, they are also hard to move around: since complicated data structures are necessary for the relevant computations, they cannot be readily translated to different programming languages.

Can there be machine learning without black boxes?

The answer to that question is yes. In the simplest case, a machine learning model can be a linear regression and consist of a line defined by an explicit algebraic equation. This is not a black box method, since it is clear how the variables are being used to compute an output.

But linear models are quite limited and cannot perform the same kinds of tasks that neural networks do, for example. So a more interesting question is: is there a machine learning method capable of finding nonlinear patterns in an explicit and understandable way?

It turns out that such method exists, and is called symbolic regression.

Symbolic regression as an alternative

The idea of symbolic regression is to find explicit mathematical formulas that connect input variables to an output, while trying to keep those formulas as simple as possible. The resulting models end up being explicit equations that can be written on a sheet of paper, making it apparent how the input variables are being used despite the presence of nonlinear computations.

To give a clearer picture, consider some models found by TuringBot, a symbolic regression software for PC:

Symbolic models found by the TuringBot symbolic regression software.

In the “Solutions” box above, a typical result of a symbolic regression optimization can be seen. A set of formulas of increasing complexity was found, with more complex formulas only being shown if they perform better than all simpler alternatives. A nonlinearity in the input dataset was successfully recovered through the use of nonlinear base functions like cos(x), atan(x) and multiplication.

Symbolic regression is a very general technique: although the most obvious use case is to solve regression problems, it can also be used to solve classification problems by representing categorical variables as different integer numbers, and running the optimization with classification accuracy as the search metric instead of RMS error. Both of these options are available in TuringBot.

Conclusion

In this article, we have seen that despite most machine learning methods indeed being black boxes, not all of them are. A simple counterexample are linear models, which are explicit and hence not black boxes. More interestingly, we have seen how symbolic regression is capable of solving machine learning tasks where nonlinear patterns are present, generating models that are mathematical equations that can be analyzed and interpreted.

Share this with your network:

Neural networks: what are the alternatives?

In this article, we will see some alternatives to neural networks that can be used to solve the same types of machine learning tasks that they do.

What are neural networks

Neural networks are by far the most popular machine learning method. They are capable of automatically learning hidden features from input data prior to computing an output value, and established algorithms exist for finding the optimal internal parameters (weights and biases) based on a training dataset.

The basic architecture is the following. The building blocks are perceptrons, which take values as input, calculate a weighed sum of those values and apply a non-linear activation function to the result. The output is then either fed into perceptrons of a next layer, or it is sent to the output if that was the last layer.

The basic architecture of a neural network. Blue circles are perceptrons.

This architecture is directly inspired on the workings of a human brain. Combined with a neural network’s ability to learn from data, a strong association between this machine learning method and the notion of artificial intelligence can be drawn.

Alternatives to neural networks

Despite being so popular, neural networks are not the only machine learning method available. Several alternatives exist, and in many contexts these alternatives may perform better than them.

Some noteworthy alternatives are the following:

  • Random forests, which consist of an ensemble of decision trees, each trained with a random subset of the training dataset. This method corrects a decision tree’s tendency to overfit the input data.
  • Support vector machines, which attempt to map the input data into a space where it is linearly separable into different categories.
  • k-nearest neighbors algorithm (KNN), which looks for the values in the training dataset that are closest to a new input, and combines the target variables associated to those nearest neighbors into a new prediction.
  • Symbolic regression, a technique which tries to find explicit mathematical formulas that connect the input variables to the target variable.

A noteworthy alternative

Among the alternatives above, all but symbolic regression involve implicit computations under the hood that cannot be easily interpreted. With symbolic regression, the model is an explicit mathematical formula that can be written on a sheet of paper, making this technique an alternative to neural networks of particular interest.

Here is how it works: given a set of base functions, for instance sin(x), exp(x), addition, multiplication, etc, a training algorithm tries to find the combinations of those functions that best predict the output variable taking as input the input variables. It is important that the formulas encountered are the simplest ones possible, so the algorithm will automatically discard a formula if it finds a simpler one that performs just as well.

Here is an example of output for a symbolic regression optimization, in which a set of formulas of increasing complexity were found that describe the input dataset. The symbolic regression package used is called TuringBot, a desktop application that can be downloaded for free.

Formulas found with a symbolic regression optimization.

This method very much resembles a scientist looking for mathematical laws that explain data, like Kepler did with data on the positions of planets in the sky to find his laws of planetary motion.

Conclusion

In this article, we have seen some alternatives to neural networks based on completely different ideas, including for instance symbolic regression which generates models that are explicit and more explainable than a neural network. Exploring different models is very valuable, because they may perform differently in different particular contexts.

Share this with your network: