
A regression model example and how to generate it
In this example, we use symbolic regression to predict house prices as a function of their characteristics.
Discover mathematical formulas from values with TuringBot, a desktop software for Symbolic Regression.
Say you want to predict a numerical value from a set of input variables. In 2021, most people would go about it in one of two ways:
The first option is very limited. It barely scratches the space of all possible mathematical relationships that could be relevant.
The second option yields models that are highly susceptible to overfitting and that do not offer much insight into the data.
This is where TuringBot comes in: it solves the problem by finding explicit mathematical formulas that connect the variables. This way, it generalizes curve-fitting methods (including linear and polynomial regression), while generating models that are simple and explainable.
Whether you are an engineer, a researcher, a data scientist, or a quantitative analyst, TuringBot will give you a HUGE edge.
TuringBot implements a technique called symbolic regression. It tries to combine a set of base functions into simple formulas that accurately predict the desired variable. Examples of base functions are addition, sin(x), exp(x), etc.
What is optimized is the formula itself, and not just the numerical constants of some assumed model.
The program uses TXT or CSV files as input, which may contain an arbitrary number of columns. It can be executed both interactively through its powerful graphical interface or in an automated way from the command line.
Here is an example of an input file that you can use: input.txt.
If your problem involves predicting a number as a function of other numbers, then you can apply TuringBot to it. Just save the data in TXT or CSV format, load it in the program, and start the search.
To give a few concrete examples:
Note that the last two examples are classification problems. This is not an issue: just find formulas that output 0 or 1 depending on the category.
A decision boundary found with symbolic regression. Tutorial
What makes TuringBot so general is that many different search metrics are included, allowing models with different goals to be generated. Those include:
Both TuringBot and Eureqa are implementations of symbolic regression, but the algorithms used by each are completely different. Eureqa is based on genetic programming, while TuringBot is based on simulated annealing.
Eureqa was acquired by a consulting company called DataRobot and is no longer commercially available.
A recent paper has shown that TuringBot performs noticeably better than Eureqa on a variety of Physics-inspired problems (arXiv:2010.11328). In this paper, TuringBot even managed to solve problems for which Eureqa could not find a solution at all.
Many free symbolic regression packages have been developed in the past, including notably gplearn but also many other small repositories that can be found on GitHub.
If you try any of these packages and compare the performance to TuringBot, you will instantly notice that their performance is vastly inferior. The main reasons for that are two:
In practice, what this means is that the formula you are looking for may never be discovered if you use a slow package unless it turns out to be relatively trivial.
TuringBot can be downloaded and used for free for as long as you want, but it also has a paid version that unlocks more functionalities. You can find more details on the Pricing page.
Start finding formulas for your data today.
Want to see TuringBot in action?
Check out the official blog.
In this example, we use symbolic regression to predict house prices as a function of their characteristics.
Here we use TuringBot to develop a classification algorithm that predicts stock market price changes.
Learn how to run TuringBot in a fully automated and customizable way from Python.
See also: Symbolic Regression: The Forgotten Machine Learning Method
(Towards Data Science).