A Guide to Symbolic Regression Machine Learning

Symbolic Regression is a technique that discovers explicit mathematical formulas that connect variables on a dataset. This allows machine learning problems to be solved in a very elegant and robust way. Here we will talk about this method and its advantages.


Symbolic Regression is a numerical technique that combines machine learning and statistics. It tries to combine simple base functions like sin(x), exp(x), addition, multiplication, etc into formulas that predict a variable as a function of other variables.

So why is this interesting? Well, we can apply this to real-world problems like classification of stock price changes, calculating losses for insurance claims, or calculating house prices as a function of its characteristics.

The set of mathematical formulas to choose from is larger than you might imagine, but this is where a good symbolic regression algorithm will help. By navigating this space in a clever way, meaningful formulas can be obtained in a reasonable amount of time.

Symbolic Regression Machine Learning

Symbolic regression is typically used for tasks that have a set of observed variables and a prediction that should be made. For example, a large set of income data for individuals and an optimal mortgage loan prediction.

When the dataset is large, traditional machine learning methods can overfit and end up being inaccurate. Since symbolic regression models are simple and use the least possible amount of variables, they are typically more robust and may have lower chances of overfitting the data.

In a way, Symbolic Regression is a machine learning technique that “looks under the hood” to determine the variables that matter for predicting the target variable.

Symbolic Regression Example

Say we have a dataset with variables x1, x2, …, xn, and a target feature y. By providing a table of this data to a symbolic regression engine, it will start randomly trying different combinations of the input features and the base functions to try to predict the target variable while keeping track of the best formulas found so far.

You can see an example of this process in the figure below, where the target and input variables can be seen on the upper left of the interface.

Example of Symbolic Regression optimization using TuringBot.


Machine learning is one of the most exciting fields in the world. It is about optimizing models that are capable of learning from huge amounts of data. Examples are computer vision algorithms for image recognition and general-purpose models like support vector machines and neural networks. Symbolic regression is an alternative to these methods that works by finding explicit formulas that connect the variables, allowing hidden nonlinear patterns to be uncovered.

About TuringBot

TuringBot is a desktop software for Symbolic Regression. By feeding your data in .TXT or .CSV format into the program, you can immediately start searching for mathematical formulas that connect the variables. If you want to learn more about what TuringBot can offer you, please visit our homepage.