Symbolic regression is an innovative machine learning technique that is capable of generating results similar to those of neural networks, but with a completely different approach. Here we will talk about its basic characteristics, and show how it can be used to solve deep learning problems.
What is deep learning?
The concept of deep learning has emerged in the context of artificial neural networks. A neural network which contains hidden layers is capable of pre-processing the input information and extracting non-trivial features prior to combining that input into an output value. The term “deep learning” comes from the presence of those multiple layers.
More recently, it has become common to call deep learning any machine learning technique that is capable of extracting non-trivial information from an input and using that to predict target variables in a way that is not possible for classical statistical methods.
How symbolic regression works
Despite being so common, neural networks are not the only way to extract non-trivial patterns from input data. An alternative technique, which is capable of solving the same tasks as neural networks, is called symbolic regression.
The idea of symbolic regression is to find explicit mathematical formulas that predict a target variable taking as input a set of input variables. Sophisticated algorithms have to be employed to efficiently search over the space of all mathematical formulas, which is very large. The most common approach is to use genetic algorithms for this search, but TuringBot shows that a simulated annealing optimization also gives excellent results.
The biggest difference between symbolic regression and neural networks is that the models that result from the former are explicit. Neural networks often require hundreds of weights to be represented, whereas a symbolic model might be a mathematical formula that fits on a single line. This way, symbolic regression can be said to be an alternative to neural networks that does not involve black boxes.
Deep learning with symbolic regression
So how does it work to solve a traditional deep learning task with symbolic regression? To give an example, let’s try to use it to classify the famous Iris dataset, in which four features of flowers are given and the goal is to classify the species of those flowers using this data. You can find the raw dataset here: iris.txt.
After loading this dataset in the symbolic regression software TuringBot, selecting “classification accuracy” as the search metric and setting a 50/50 test/train split for the training, these were the formulas that it ended up finding, ordered by complexity in ascending order:
The error shown is the out-of-sample error. It can be seen that the best formula turned out to be one of intermediate size, not so small that it cannot find any pattern, but also not so large that it overfit the data. Its classification accuracy in the test domain was 98%.
If you found this example interesting, you might want to download TuringBot for free and give it a try with your own data. It can be used to solve regression and classification problems in general.
In this article, we have seen how symbolic regression can be used to solve problems where a non-linear relationship between the input variables exist. Despite neural networks being so common, this alternative approach is capable of finding models that perform similarly, but with the advantage of being simple and explainable.