A Guide to Symbolic Regression Machine Learning

Symbolic Regression is a technique that discovers explicit mathematical formulas that connect variables on a dataset. This allows machine-learning problems to be solved in a very elegant and robust way. Here we will talk about this method and its advantages.

Introduction

Symbolic Regression is a numerical technique that combines machine learning and statistics. It tries to combine simple base functions like sin(x), exp(x), addition, multiplication, etc into formulas that predict a variable as a function of other variables.

So why is this interesting? Well, we can apply this to real-world problems like the classification of stock price changes, calculating losses for insurance claims, or calculating house prices as a function of their characteristics.

The set of mathematical formulas to choose from is larger than you might imagine, but this is where a good symbolic regression algorithm will help. By cleverly navigating this space, meaningful formulas can be obtained in a reasonable amount of time.

Symbolic Regression Machine Learning

Symbolic regression is typically used for tasks that have a set of observed variables and a prediction that should be made. For example, a large set of income data for individuals and an optimal mortgage loan prediction.

When the dataset is large, traditional machine learning methods can overfit and end up being inaccurate. Since symbolic regression models are simple and use the least possible amount of variables, they are typically more robust and may have lower chances of overfitting the data.

In a way, Symbolic Regression is a machine learning technique that “looks under the hood” to determine the variables that matter for predicting the target variable.

Symbolic Regression Example

Say we have a dataset with variables x1, x2, …, xn, and a target feature y. By providing a table of this data to a symbolic regression engine, it will start randomly trying different combinations of the input features and the base functions to try to predict the target variable while keeping track of the best formulas found so far.

You can see an example of this process in the figure below, where the target and input variables can be seen on the upper left of the interface:

Conclusion

Machine learning is one of the most exciting fields in the world. It is about optimizing models that are capable of learning from huge amounts of data. Examples are computer vision algorithms for image recognition and general-purpose models like support vector machines and neural networks. Symbolic regression is an alternative to these methods that works by finding explicit formulas that connect the variables, allowing hidden nonlinear patterns to be uncovered.