Introduced in 2009, the Eureqa software gained great popularity with the promise that it could potentially be used to derive new physical laws from empirical data in an automatic way. Details of this reasoning can be found in the original paper, called Distilling Free-Form Natural Laws from Experimental Data.
In 2017 this software was acquired by a global consulting company called DataRobot and left the market. The promise of revolutionizing physics was never quite fulfilled, but the project had a major impact in raising awareness about symbolic regression.
Here we want to compare Eureqa to a more recent symbolic regression software called TuringBot.
Similarly to Eureqa, TuringBot is a symbolic regression software. It has a simple graphical interface that allows the user to load a dataset and then try to find formulas that predict a target column taking as input the remaining columns:
This software was introduced in 2020, and contrary to Eureqa it does not use a genetic algorithm to search for formulas, but instead a novel algorithm based on simulated annealing. While most references to symbolic regression in the literature involve genetic algorithms, our finding was that simulated annealing yields results much faster if implemented the right way.
Simulated annealing is inspired by a metallurgic process in which a metal is heated to a high temperature and then slowly cooled to attain better physical properties. The algorithm starts at first very “hot”, with worse solutions being accepted very often, and over time it cools down and becomes more strict about the solutions that it passes by. This allows the algorithm to overcome local maxima and discover the global maximum in a stochastic way.
Both TuringBot and Eureqa implement the idea searching for the best formulas of each possible size, and not just a single optimal formula. This is the essence of a Pareto optimization, and it results on a list of formulas of increasing complexity and accuracy to choose from.
A handy feature offered by TuringBot is to create a train/test split for the optimization and see in real-time the test error for the solutions discovered so far. This allows overfit solutions to be spotted very easily.
TuringBot is available for both Windows and Linux. It can be downloaded for free, but it also has a paid plan with more functionalities.
The software is already being used by many researchers and engineers around the world to study topics including turbine design, materials science and zoology, and also by business owners to come up with pricing models and other applications.
You might also like our article on Symbolic Regression featured on Towards Data Science: Symbolic Regression: The Forgotten Machine Learning Method.