Eureqa vs TuringBot for symbolic regression

Introduced in 2009, the Eureqa software gained great popularity with the promise that it could potentially be used to derive new physical laws from empirical data in an automatic way. Details of this reasoning can be found in the original paper, called Distilling Free-Form Natural Laws from Experimental Data.

In 2017 this software was acquired by a global consulting company called DataRobot and left the market. The promise of revolutionizing physics was never quite fulfilled, but the project had a major impact in raising awareness about symbolic regression.

Here we want to compare Eureqa to a more recent symbolic regression software called TuringBot.

About TuringBot

Similarly to Eureqa, TuringBot is a symbolic regression software. It has a simple graphical interface that allows the user to load a dataset and then try to find formulas that predict a target column taking as input the remaining columns:

The TuringBot interface.

This software was introduced in 2020, and contrary to Eureqa it does not use a genetic algorithm to search for formulas, but instead a novel algorithm based on simulated annealing. While most references to symbolic regression in the literature involve genetic algorithms, our finding was that simulated annealing yields results much faster if implemented the right way.

Simulated annealing is inspired by a metallurgic process in which a metal is heated to a high temperature and then slowly cooled to attain better physical properties. The algorithm starts at first very “hot”, with worse solutions being accepted very often, and over time it cools down and becomes more strict about the solutions that it passes by. This allows the algorithm to overcome local maxima and discover the global maximum in a stochastic way.

Pareto optimization

Both TuringBot and Eureqa implement the idea searching for the best formulas of each possible size, and not just a single optimal formula. This is the essence of a Pareto optimization, and it results on a list of formulas of increasing complexity and accuracy to choose from.

A list of formulas of increasing complexity discovered by TuringBot.

A handy feature offered by TuringBot is to create a train/test split for the optimization and see in real-time the test error for the solutions discovered so far. This allows overfit solutions to be spotted very easily.


TuringBot is available for both Windows and Linux. It can be downloaded for free, but it also has a paid plan with more functionalities.

The software is already being used by many researchers and engineers around the world to study topics including turbine design, materials science and zoology, and also by business owners to come up with pricing models and other applications.

You might also like our article on Symbolic Regression featured on Towards Data Science: Symbolic Regression: The Forgotten Machine Learning Method.

How to create an equation for data points?

In order to find an equation from a list of values, a special technique called symbolic regression must be used. The idea is to search over the space of all possible mathematical formulas for the ones with the greatest accuracy, while trying to keep those formulas as simple as possible.

In this tutorial, we are going to show how to find formulas using the desktop symbolic regression software TuringBot, which is very easy to use.

How symbolic regression works

Symbolic regression starts from a set of base functions to be used in the search, such as addition, multiplication, sin(x), exp(x), etc, and then tries to combine those functions in all possible ways with the goal of finding a model that will be as accurate as possible in predicting a target variable. Some examples of base functions used by TuringBot are the following:

Some base functions that TuringBot uses for symbolic regression.

As important as the accuracy of a formula is its simplicity. A huge formula can predict with perfect accuracy the data points, but if the number of free parameters in the model is the same as the number of points then this model is not really informative. For this reason, a symbolic regression optimization will discard a larger formula if it finds a smaller one that performs just as well.

Finding a formula with TuringBot

Finding equations from data points with TuringBot is a simple process. The first step is selecting the input file with the data through the interface. This input file should be in TXT or CSV format. After it has been loaded, the target variable can be selected (by default it will be the last column in the file), and the search can be started. This is what the interface looks like:

The interface of the TuringBot symbolic regression software.

Several options are available on the menus on the left, such as setting a test/train split to be able to detect overfit solutions, selecting which base functions should be used, and selecting the search metric, which by default is root-mean-square error, but that can also be set to classification accuracy, mean relative error and others. For this example, we are going to keep it simple and just use the defaults.

The optimization is started by clicking on the play button at the top of the interface. The best formulas found so far will be shown in the solutions box, ordered by complexity:

The formulas found by TuringBot for an example dataset.

The software allows the solutions to be exported to common programming languages from the menu, and also to simply be exported as text. Here are the formulas in the example above exported in text format:

Complexity   Error      Function
1            1.91399    -0.0967549
3            1.46283    0.384409*x
4            1.362      atan(x)
5            1.18186    0.546317*x-1.00748
6            1.11019    asinh(x)-0.881587
9            1.0365     ceil(asinh(x))-1.4131
13           0.985787   round(tan(floor(0.277692*x)))
15           0.319857   cos(x)*(1.96036-x)*tan(x)
19           0.311375   cos(x)*(1.98862-1.02261*x)*tan(1.00118*x)


In this tutorial, we have seen how symbolic regression can be used to find formulas from values. Symbolic regression is very different from regular curve-fitting methods, since no assumption is made about what the shape of the formulas should be. This allows patterns to be found in datasets with an arbitrary number of dimensions, making symbolic regression a general purpose machine learning technique.

Symbolic regression tutorial with TuringBot

In this tutorial, we are going to show how you can find a formula from your data using the symbolic regression software TuringBot. It is a desktop software that runs on both Windows and Linux, and as you will see the usage is very simple.

Preparing the data

TuringBot takes as input files in .txt or CSV format containing one variable per column. The first row may contain the names of the variables, otherwise they will be labelled col1, col2, col3, etc.

For instance, the following is a valid input file:

x y z w classification
5.20 2.70 3.90 1.40 1
6.50 2.80 4.60 1.50 1
7.70 2.80 6.70 2.00 2
5.90 3.20 4.80 1.80 1
5.00 3.50 1.60 0.60 0
5.10 3.50 1.40 0.20 0
4.60 3.10 1.50 0.20 0
6.90 3.20 5.70 2.30 2

Loading the data into TuringBot

This is what the program looks like when you open it:

The TuringBot interface.

By clicking on the “Input file” button on the upper left, you can select your input file and load it. Different search metrics are available, including for instance classification accuracy, and a handy cross validation feature can also be enabled in the “Search options” box — if enabled, it will automatically create a test/train split and allow you to see the out-of-sample error as the optimization goes on. But in this example we are going to keep things simple and just use the defaults.

Finding the formulas

After loading the data, you can click on the play button at the top of the interface to start the optimization. The best formulas found so far will be shown in the “Solutions” box, in ascending order of complexity. A formula is only shown if its accuracy is greater than that of all simpler alternatives — in symbolic regression, the goal is not simply to find a formula, but to find the simplest ones possible.

Here are the formulas it found for an example dataset:

Finding formulas with TuringBot.

The formulas are all written in a format that is compatible out of the box with Python and C. Indeed, the menu on the upper right allows you to export the solutions to these languages:

Exporting solutions to different languages.

In this example, the true formula turned out to be sqrt(x), which was recovered in a few seconds. The methodology would be the same for a real-world dataset with many input variables and an unknown dependency between them.

How to get TuringBot

If you have liked this tutorial, we encourage you to download TuringBot for free from the official website. As we have shown, it is very simple to use, and its powerful mathematical modelling capabilities allow you to find very subtle numerical patterns in your data. Much like a scientist would do from empirical observations, but in an automatic way and millions of times faster.

An alternative to the Eureqa software

Eureqa is a symbolic regression software based on genetic programming. Here we will talk about an alternative to that software called TuringBot.

About Eureqa

Eureqa used to be developed by a company called Nutonian. A few years ago this company was acquired by a consulting company called Data Robot, and Eureqa has been removed from the market after that.

The program gained popularity due to its ease of use. Finding mathematical formulas from data using its graphical interface was very convenient and required no coding.

The alternative: TuringBot

An alternative to Eureqa exists and is called TuringBot. It uses a completely different approach to solve symbolic regression problems, based on a simulated annealing algorithm. It can be downloaded for free from the official website.

Here is what its interface looks like:

The interface of the TuringBot symbolic regression software.

It features a variety of search metrics, allowing many different kinds of machine learning models to be solved. Those include the basic RMS and mean error regression metrics, but also classification accuracy, F1 score (for rare event classification) and correlation coefficient.

The code allows overfit solutions to be easily ruled out with its convenient cross validation feature. A test/train split can be enabled through the interface, and the out-of-sample error shown in the solutions box can be used to select the formula with the best trade-off between size and accuracy.

A recent paper has compared the performances of Eureqa and TuringBot (arXiv:2010.11328). It found that TuringBot discovers formulas more efficiently than Eureqa and that it is capable of solving problems for which Eureqa could not find a solution at all. The problems used in this evaluation were Physics-inspired and extracted from the famous book The Feynman Lectures on Physics.

More information

If the concept of TuringBot sounds interesting to you, you can learn more about it from the official website, and also from the posts on this blog. We suggest the following to get started:

For an introduction to Symbolic Regression, you can also check out the Wikipedia article on the topic.