## Neural networks are overrated

When it comes to AI, neural networks are the first method that comes to mind. Despite their impressive performance on a number of applications, we want to argue that they are not necessarily a good general-purpose machine learning method.

### Neural network basics

Neural networks are powerful computation devices. Their basic design is the following:

Each circle is a perceptron, and the perceptrons are organized in layers. What a perceptron does is:

1. Take a number as input.
2. Add a bias to it (a fixed number).
3. Apply an activation function to the result (for instance tanh or sigmoid).
4. Send the result either to the perceptrons of the next layer, or to the output if that was the last layer.

When a perceptron takes as input several numbers from a past layer, each of those numbers is multiplied by a weight (usually between -1 and 1) which characterizes the strength of the connection between those two perceptrons. The numbers are then added together and go through the steps 1-4 outlined above.

To sum up in a visual way, this is what a perceptron does:

### How many hidden layers?

A natural question when it comes to creating a neural network model is: how many hidden layers should be used, and with how many perceptrons each?

A common rule of thumb is that the number of perceptrons in a layer should be between the number in the previous layer and the number in the next one.

Regarding the number of hidden layers, author Jeff Heaton writes in his book Introduction to Neural Networks for Java:

It can be seen that, with just a single hidden layer, any continuous function on a bounded interval can be represented, and that with two hidden layers any map can be represented. So it is never really necessary to have 3 or more hidden layers.

### An embarrassing example

With everything that we have seen so far, neural networks seem like a very elegant method with promising computation capabilities. But how well do they perform?

To give a concrete example, consider the following function on the interval [0, 1]:

This function was hand drawn and converted to numbers using WebPlotDigitizer, so it is a simple but nontrivial example.

What happens if we try to fit this function with a neural network regressor?

The following script trains a neural network with one hidden layer containing 100 perceptrons using the scikit-learn Python library:

```import numpy as np
import pandas as pd
from sklearn.neural_network import MLPRegressor

X = np.array(df['x']).reshape(-1, 1)
y = df['y']

nn = MLPRegressor(random_state=1, max_iter=500, hidden_layer_sizes=(100, )).fit(X, y)

prediction = nn.predict(X)```

And this is what the resulting model looks like:

Clearly, this is not a good fit.

What if we add one more hidden layer? For instance, (100, 50) instead of just one (100,) hidden layer like we did before:

`nn = MLPRegressor(random_state=1, max_iter=500, hidden_layer_sizes=(100, 50)).fit(X, y)`

This is the result:

No much improvement. Bear in mind that the model visualized above has tens of thousands of free parameters (weights and biases), but it still performed poorly.

### Alternatives to neural networks

Now you might think that we have just picked a particularly hard example that will not be properly represented by any typical machine learning method. To show that this is not the case, consider the following alternative model, obtained through symbolic regression using the desktop software TuringBot:

This model consists of the following simple formula:

```def f(x):
return log(0.0192917+x)*2.88451*x*x+0.797118```

Despite the model being simple and not containing ten thousand parameters, it managed to represent our nontrivial function with great accuracy.

### Conclusion

Our goal in this article was to question the notion that neural networks are “the” machine learning method, and that they possess some kind of magical machine learning capability that allows them to find hidden patterns everywhere.

It might be the case that for most of the typical applications of machine learning, neural networks might actually underperform simpler alternative methods.

## Machine learning black box models: some alternatives

In this article, we will discuss a very basic question regarding machine learning: is every model a black box? Certainly most methods seem to be, but as we will see, there are very interesting exceptions to this.

### What is a black box method?

A method is said to be a black box when it performs complicated computations under the hood that cannot be clearly explained and understood. Data is fed into the model, internal transformations are performed on this data and an output is given, but these transformations are such that basic questions cannot be answered in a straightforward way:

• Which of the input variables contributed the most to generating the output?
• Exactly what features did the model derive from the input data?
• How does the output change as a function of one of the variables?

Not only are black box models hard to understand, they are also hard to move around: since complicated data structures are necessary for the relevant computations, they cannot be readily translated to different programming languages.

### Can there be machine learning without black boxes?

The answer to that question is yes. In the simplest case, a machine learning model can be a linear regression and consist of a line defined by an explicit algebraic equation. This is not a black box method, since it is clear how the variables are being used to compute an output.

But linear models are quite limited and cannot perform the same kinds of tasks that neural networks do, for example. So a more interesting question is: is there a machine learning method capable of finding nonlinear patterns in an explicit and understandable way?

It turns out that such method exists, and is called symbolic regression.

### Symbolic regression as an alternative

The idea of symbolic regression is to find explicit mathematical formulas that connect input variables to an output, while trying to keep those formulas as simple as possible. The resulting models end up being explicit equations that can be written on a sheet of paper, making it apparent how the input variables are being used despite the presence of nonlinear computations.

To give a clearer picture, consider some models found by TuringBot, a symbolic regression software for PC:

In the “Solutions” box above, a typical result of a symbolic regression optimization can be seen. A set of formulas of increasing complexity was found, with more complex formulas only being shown if they perform better than all simpler alternatives. A nonlinearity in the input dataset was successfully recovered through the use of nonlinear base functions like cos(x), atan(x) and multiplication.

Symbolic regression is a very general technique: although the most obvious use case is to solve regression problems, it can also be used to solve classification problems by representing categorical variables as different integer numbers, and running the optimization with classification accuracy as the search metric instead of RMS error. Both of these options are available in TuringBot.

### Conclusion

In this article, we have seen that despite most machine learning methods indeed being black boxes, not all of them are. A simple counterexample are linear models, which are explicit and hence not black boxes. More interestingly, we have seen how symbolic regression is capable of solving machine learning tasks where nonlinear patterns are present, generating models that are mathematical equations that can be analyzed and interpreted.

## Neural networks: what are the alternatives?

In this article, we will see some alternatives to neural networks that can be used to solve the same types of machine learning tasks that they do.

### What are neural networks

Neural networks are by far the most popular machine learning method. They are capable of automatically learning hidden features from input data prior to computing an output value, and established algorithms exist for finding the optimal internal parameters (weights and biases) based on a training dataset.

The basic architecture is the following. The building blocks are perceptrons, which take values as input, calculate a weighed sum of those values and apply a non-linear activation function to the result. The output is then either fed into perceptrons of a next layer, or it is sent to the output if that was the last layer.

This architecture is directly inspired on the workings of a human brain. Combined with a neural network’s ability to learn from data, a strong association between this machine learning method and the notion of artificial intelligence can be drawn.

### Alternatives to neural networks

Despite being so popular, neural networks are not the only machine learning method available. Several alternatives exist, and in many contexts these alternatives may perform better than them.

Some noteworthy alternatives are the following:

• Random forests, which consist of an ensemble of decision trees, each trained with a random subset of the training dataset. This method corrects a decision tree’s tendency to overfit the input data.
• Support vector machines, which attempt to map the input data into a space where it is linearly separable into different categories.
• k-nearest neighbors algorithm (KNN), which looks for the values in the training dataset that are closest to a new input, and combines the target variables associated to those nearest neighbors into a new prediction.
• Symbolic regression, a technique which tries to find explicit mathematical formulas that connect the input variables to the target variable.

### A noteworthy alternative

Among the alternatives above, all but symbolic regression involve implicit computations under the hood that cannot be easily interpreted. With symbolic regression, the model is an explicit mathematical formula that can be written on a sheet of paper, making this technique an alternative to neural networks of particular interest.

Here is how it works: given a set of base functions, for instance sin(x), exp(x), addition, multiplication, etc, a training algorithm tries to find the combinations of those functions that best predict the output variable taking as input the input variables. It is important that the formulas encountered are the simplest ones possible, so the algorithm will automatically discard a formula if it finds a simpler one that performs just as well.

Here is an example of output for a symbolic regression optimization, in which a set of formulas of increasing complexity were found that describe the input dataset. The symbolic regression package used is called TuringBot, a desktop application that can be downloaded for free.

This method very much resembles a scientist looking for mathematical laws that explain data, like Kepler did with data on the positions of planets in the sky to find his laws of planetary motion.

### Conclusion

In this article, we have seen some alternatives to neural networks based on completely different ideas, including for instance symbolic regression which generates models that are explicit and more explainable than a neural network. Exploring different models is very valuable, because they may perform differently in different particular contexts.

## A free AI software for PC

If you are interested in solving AI problems and would like an easy to use desktop software that yields state of the art results, you might like TuringBot. In this article, we will show you how it can be used to easily solve classification and regression problems, and explain the methodology that it uses, which is called symbolic regression.

### The software

TuringBot is a desktop application that runs on both Windows and Linux, and that can be downloaded for free from the official website. This is what its interface looks like:

The usage is simple: you load your data in CSV or TXT format through the interface, select which column should be predicted and which columns should be used as input, and start the search. The program will look for explicit mathematical formulas that predict this target variable, and show the results in the Solutions box.

### Symbolic regression

The name of this technique, which looks for explicit formulas that solve AI problems, is symbolic regression. It is capable of solving the same problems as neural networks, but in an explicit way that does not involve black box computations.

Think of what Kepler did when he extracted his laws of planetary motion from observations. He looked for algebraic equations that could explain this data, and found timeless patterns that are taught to this day in schools. What TuringBot does is something similar to that, but millions of times faster than a human could ever do.

An important point in symbolic regression is that it is not sufficient for a model to be accurate — it also has to be simple. This is why TuringBot’s algorithm tries to find the best formulas of all possible sizes simultaneously, discarding larger formulas that do not perform better than simpler alternatives.

### The problems that it can solve

Some examples of problems that can be solved by the program are the following:

• Regression problems, in which a continuous target variable should be predicted. See here a tutorial in which we use the program to recover a mathematical formula without previous knowledge of what that formula was.
• Classification problems, in which the goal is to classify inputs into two or more different categories. The rationale of solving this kind of problem using symbolic regression is to represent different categorical variables as different integer numbers, and run the optimization with “classification accuracy” as the search metric (this can easily be selected through the interface). In this article, we teach how to use the program to classify the Iris dataset.
• Classification of rare events, in which a classification task must be solved on highly imbalanced datasets. The logic is similar to that of a regular classification problem, but in this case a special metric called F1 score should be used (also available in TuringBot). In this article, we found a formula that successfully classified credit card frauds on a real-world dataset that is highly imbalanced.

### Getting TuringBot

If you liked the concept of TuringBot, you can download it for free from the official website. There you can also find the official documentation, with more information about the search metrics that are available, the input file formats and the various features that the program offers.

## How to find a formula for the nth term of a sequence

Given a sequence of numbers, finding an explicit mathematical formula that computes the nth term of the sequence can be challenging, except in very special cases like arithmetic and geometric sequences.

In the general case, this task involves searching over the space of all mathematical formulas for the most appropriate one. A special technique exists that does just that: symbolic regression. Here we will introduce how it works, and use it to find a formula for the nth term in the Fibonacci sequence (A000045 in the OEIS) as an example.

### What symbolic regression is

Regression is the task of establishing a relationship between an output variable and one or more input variables. Symbolic regression solves this task by searching over the space of all possible mathematical formulas for the ones with the greatest accuracy, while trying to keep those formulas as simple as possible.

The technique starts from a set of base functions — for instance, sin(x), exp(x), addition, multiplication, etc. Then it tries to combine those base functions in various ways using an optimization algorithm, keeping track of the most accurate ones found so far.

An important point in symbolic regression is simplicity. It is easy to find a polynomial that will fit any sequence of numbers with perfect accuracy, but that does not really tell you anything since the number of free parameters in the model is the same as the number of input variables. For this reason, a symbolic regression procedure will discard a larger formula if it finds a smaller one that performs just as well.

### Finding the nth Fibonacci term

Now let’s show how symbolic regression can be used in practice by trying to find a formula for the Fibonacci sequence using the desktop symbolic regression software TuringBot. The first two terms of the sequence are 1 and 1, and every next term is defined as the sum of the previous two terms. Its first terms are the following, where the first column is the index:

```1 1
2 1
3 2
4 3
5 5
6 8
7 13
8 21
9 34
10 55```

A list of the first 30 terms can be found on this file: fibonacci.txt.

TuringBot takes as input TXT or CSV files with one variable per column and efficiently finds formulas that connect those variables. This is how it looks like after we load fibonacci.txt and run the optimization:

The software finds not only a single formula, but the best formulas of all possible complexities. A larger formula is only shown if it performs better than all simpler alternatives. In this case, the last formula turned out to predict with perfect accuracy every single one of the first 30 Fibonacci terms. The formula is the following:

`f(x) = floor(cosh(-0.111572+0.481212*x))`

Clearly a very elegant solution. The same procedure can be used to find a formula for the nth term of any other sequence (if it exists).

### Conclusion

In this tutorial, we have seen how the symbolic regression software TuringBot can be used to find a closed-form expression for the nth term in a sequence of numbers. We found a very short formula for the Fibonacci sequence by simply writing it into a text file with one number per row and loading this file into the software.

If you are interested in trying TuringBot your own data, you can download it from the official website. It is available for both Windows and Linux.

## Symbolic regression tutorial with TuringBot

In this tutorial, we are going to show how you can find a formula from your data using the symbolic regression software TuringBot. It is a desktop software that runs on both Windows and Linux, and as you will see the usage is very simple.

### Preparing the data

TuringBot takes as input files in .txt or CSV format containing one variable per column. The first row may contain the names of the variables, otherwise they will be labelled col1, col2, col3, etc.

For instance, the following is a valid input file:

```x y z w classification
5.20 2.70 3.90 1.40 1
6.50 2.80 4.60 1.50 1
7.70 2.80 6.70 2.00 2
5.90 3.20 4.80 1.80 1
5.00 3.50 1.60 0.60 0
5.10 3.50 1.40 0.20 0
4.60 3.10 1.50 0.20 0
6.90 3.20 5.70 2.30 2```

This is what the program looks like when you open it:

By clicking on the “Input file” button on the upper left, you can select your input file and load it. Different search metrics are available, including for instance classification accuracy, and a handy cross validation feature can also be enabled in the “Search options” box — if enabled, it will automatically create a test/train split and allow you to see the out-of-sample error as the optimization goes on. But in this example we are going to keep things simple and just use the defaults.

### Finding the formulas

After loading the data, you can click on the play button at the top of the interface to start the optimization. The best formulas found so far will be shown in the “Solutions” box, in ascending order of complexity. A formula is only shown if its accuracy is greater than that of all simpler alternatives — in symbolic regression, the goal is not simply to find a formula, but to find the simplest ones possible.

Here are the formulas it found for an example dataset:

The formulas are all written in a format that is compatible out of the box with Python and C. Indeed, the menu on the upper right allows you to export the solutions to these languages:

In this example, the true formula turned out to be sqrt(x), which was recovered in a few seconds. The methodology would be the same for a real-world dataset with many input variables and an unknown dependency between them.

### How to get TuringBot

If you have liked this tutorial, we encourage you to download TuringBot for free from the official website. As we have shown, it is very simple to use, and its powerful mathematical modelling capabilities allow you to find very subtle numerical patterns in your data. Much like a scientist would do from empirical observations, but in an automatic way and millions of times faster.

## Machine learning with symbolic regression

Many machine learning methods are presently available, including for instance neural networks, random forests and support vector machines. In this article, we will talk about a very unexplored algorithm called symbolic regression, and will show how it can be used to solve machine learning problems in a very transparent and explicit way.

### What is machine learning

Machine learning concerns algorithms capable of predicting numerical values (regression) and creating classifications, among other tasks. The real world is messy and randomness appears everywhere, so a major challenge that these algorithms face is being able to discern meaningful signals from the underlying noise contained in the training datasets.

What most machine learning methods have in common is that they are very implicit and resemble black boxes: numbers are fed into the model, and it spits out a result after performing a series of complex computations under the hood. This kind of processing of information is strongly connected to the notion of “artificial intelligence”, since the inner workings of the human brain are also very hard to describe, while it is capable of learning and recognizing patterns across a very wide range of domains.

### Symbolic regression

Symbolic regression is a technique that looks for mathematical formulas that predict some target variable taking as input one or more input variables. Thus, a symbolic model is nothing more than an algebraic formula that can be written on a piece of paper.

A simple case of symbolic model is a polynomial. Any dataset can be represented with perfect accuracy by a polynomial, but that is not very interesting because polynomials quickly diverge outside the train domain, and because they contain as many free parameters as the training dataset itself. So they do not really compress information in any way.

More interesting models are found by combining a set of base functions and trying to find the simplest combinations that predict some target variable. Examples of base functions are trigonometric functions, exponentials, sum, multiplication, division, etc.

For instance, these are some of the base functions used by the symbolic regression software TuringBot:

After the base functions are defined, the task is then to combine them in such way that a target variable is successfully predicted from the input variables. There is more than one way to carry out the optimization — one might be interested in maximizing the classification accuracy, or in recovering the overall shape of a curve without much regard for outliers, etc. For this reason, TuringBot allows many different search metrics to be used:

Some examples of problems that can be solved with symbolic regression include:

Clearly the method is very general, and can be creatively used to solve a variety of problems.

### Conclusion

In this article, we have seen how symbolic regression is an alternative machine learning method capable of generating explicit models and solving various classes of problems in an elegant way. If you are interested in generating symbolic models from your own data and seeing what patterns it can find, you can download the symbolic regression software TuringBot, which works on both Windows and Linux, for free.

## How to find formulas from values

Finding mathematical formulas from data is an extremely useful machine learning task. A formula is the most compressed representation of a table, allowing large amounts of data to be compressed into something simple, while also making explicit the relationship that exists between the different variables.

In this tutorial, we are going to generate a dataset and try to recover the original formula using the symbolic regression software TuringBot, without any previous knowledge of what that formula was.

### What symbolic regression is

Symbolic regression is a machine learning technique that tries to find explicit mathematical formulas that connect variables. The technique starts from a set of base functions to be used in the search, for instance, addition, multiplication, sin(x), exp(x), etc, and then tries to combine those functions in such a way that the target variable is accurately predicted.

Simplicity is as important as accuracy in a symbolic regression model. Every dataset can be represented with perfect accuracy by a polynomial, but that is uninformative since the number of free parameters in the model the same as the number of training data points. For this reason, a symbolic regression optimization penalizes large formulas, favoring simpler ones that perform just as well.

### Generating an example dataset

Let’s give an explicit example of how symbolic regression can be used to find a formula from data. We will generate a dataset that consists of the formula x*cos(10*x) + 2, add noise to this data, and then see if we can recover this formula using symbolic regression.

The following Python script generates the input data:

```import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(0, 1, 100)
y = np.cos(10*x)*x + 2 + np.random.random(len(x))*0.1```

And this is what the result looks like:

Now we are going to try to find a formula for this data and see what happens.

### Finding a formula using TuringBot

The usage of TuringBot is very simple. All we have to do is load the input data using its interface and start the search. First, we save the data to an input file:

```arr = np.column_stack((x, y))
np.savetxt('input.txt', arr, fmt='%f')```

After loading input.txt into TuringBot, starting the search, and letting it work for a minute, these were the formulas that it found, ordered by complexity:

It can be seen that it has successfully found our original formula!

### Conclusion

Here we have seen how symbolic regression can be used to automatically find mathematical formulas from data values. The example that we have given was a simple one, but the procedure that we have used would also work for a real-world dataset in which the dependencies between the variables was not known beforehand, and in which more than one input variable was present.

If you are interested in trying to find formulas from your own dataset, you can download TuringBot for free from the official website.

## Deep learning with symbolic regression

Symbolic regression is an innovative machine learning technique that is capable of generating results similar to those of neural networks, but with a completely different approach. Here we will talk about its basic characteristics, and show how it can be used to solve deep learning problems.

### What is deep learning?

The concept of deep learning has emerged in the context of artificial neural networks. A neural network which contains hidden layers is capable of pre-processing the input information and extracting non-trivial features prior to combining that input into an output value. The term “deep learning” comes from the presence of those multiple layers.

More recently, it has become common to call deep learning any machine learning technique that is capable of extracting non-trivial information from an input and using that to predict target variables in a way that is not possible for classical statistical methods.

### How symbolic regression works

Despite being so common, neural networks are not the only way to extract non-trivial patterns from input data. An alternative technique, which is capable of solving the same tasks as neural networks, is called symbolic regression.

The idea of symbolic regression is to find explicit mathematical formulas that predict a target variable taking as input a set of input variables. Sophisticated algorithms have to be employed to efficiently search over the space of all mathematical formulas, which is very large. The most common approach is to use genetic algorithms for this search, but TuringBot shows that a simulated annealing optimization also gives excellent results.

The biggest difference between symbolic regression and neural networks is that the models that result from the former are explicit. Neural networks often require hundreds of weights to be represented, whereas a symbolic model might be a mathematical formula that fits on a single line. This way, symbolic regression can be said to be an alternative to neural networks that does not involve black boxes.

### Deep learning with symbolic regression

So how does it work to solve a traditional deep learning task with symbolic regression? To give an example, let’s try to use it to classify the famous Iris dataset, in which four features of flowers are given and the goal is to classify the species of those flowers using this data. You can find the raw dataset here: iris.txt.

After loading this dataset in the symbolic regression software TuringBot, selecting “classification accuracy” as the search metric and setting a 50/50 test/train split for the training, these were the formulas that it ended up finding, ordered by complexity in ascending order:

The error shown is the out-of-sample error. It can be seen that the best formula turned out to be one of intermediate size, not so small that it cannot find any pattern, but also not so large that it overfit the data. Its classification accuracy in the test domain was 98%.

If you found this example interesting, you might want to download TuringBot for free and give it a try with your own data. It can be used to solve regression and classification problems in general.

### Conclusion

In this article, we have seen how symbolic regression can be used to solve problems where a non-linear relationship between the input variables exist. Despite neural networks being so common, this alternative approach is capable of finding models that perform similarly, but with the advantage of being simple and explainable.

## Using Symbolic Regression to predict rare events

### Rare events classification

Predicting rare events is a machine learning problem of great practical importance, and also a very difficult one. Models of this kind need to be trained on highly imbalanced datasets, and are used, among other things, for spotting fraudulent online transactions and detecting anomalies in medical images.

In this article, we show how such problems can be modeled using Symbolic Regression, a technique which attempts to find mathematical formulas that predict a desired variable from a set of input variables. Symbolic models, contrary to more mainstream ones like neural networks and random forests, are not black boxes, since they clearly show which variables are being used and how. They are also very fast and easy to implement, since no complex data structures are involved in the calculations.

In order to provide a real world example, we will try to model the credit card fraud dataset available on Kaggle using our Symbolic Regression software TuringBot. The dataset consists of a CSV file containing 284,807 transactions, one per row, out of which 492 are frauds. The first 28 columns represent anonymized features, and the last one contains “0” for legitimate transactions and “1” for fraudulent ones.

Prior to the regression, we remove all quotation mark characters from the file, so that those two categories are recognized as numbers by the software.

### Symbolic regression

Generating symbolic models using TuringBot is a straightforward process, which requires no data science skills. The first step is to open the program and load the input file by clicking on the “Input” button, shown below. After loading, the code will automatically define the column “Class” as the target variable and all other ones as input variables, which is what we want.

Then, we select the error metric for the search as “F1 score”, which is the appropriate one for binary classification problems on highly imbalanced datasets like this one. This metric corresponds to a geometric mean of precision and the recall of the model. A very illustrative image that explains what precision and recall are can be found on the Wikipedia page for F1 score.

That’s it! After those two steps, the search is ready to start. Just click on the “play” button at the top of the interface. The best solutions that the program has encountered so far will be shown in the “Solutions” box in real time.

Bear in mind that this is a relatively large dataset, and that it may seem like not much is going on in the first minutes of the optimization. Ideally, you should leave the program running until at least a few million formulas have been tested (you can see the number so far in the Log tab). In a modest i7-3770 CPU with 8 threads, this took us about 6 hours. A more powerful CPU would take less time.

### The resulting formula

The models that were encountered by the program after this time were the following:

The error for the best one is 0.17, meaning its F1 score is 1 – 0.17 = 0.83. This implies that both the recall and the precision of the model are close to 83%. In a verification using Python, we have found that they are 80% and 87% respectively.

So what does this mean? That the following mathematical formula found by our program is capable of detecting 80% of all frauds in the dataset, and that it is right 87% of the time when it claims that a fraud is taking place! This is a result consistent with the best machine learning methods available.

### Conclusion

In this article, we have demonstrated that our Symbolic Regression software TuringBot is able to generate models that classify credit card frauds in a real world dataset with high precision and high recall. We believe that this kind of modeling capability, combined with the transparency and efficiency of the generated models, is very useful for those interested in developing machine learning models for the classification and prediction of rare events.