### A regression model example and how to generate it

In this example, we use symbolic regression to predict house prices as a function of their characteristics.

Say you want to predict a numerical value from a set of input variables. In 2023, most people would go about it in one of two ways:

- Fit a line or a polynomial to the data.
- Use some horribly complicated black-box method (neural networks, random forests, etc).

The first option is very limited. It barely scratches the space of all possible mathematical relationships that could be relevant.

The second option yields models that are highly susceptible to overfitting and that do not offer much insight into the data.

This is where TuringBot comes in: it solves the problem by finding explicit mathematical formulas that connect the variables. This way, it generalizes curve-fitting methods (including linear and polynomial regression), while generating models that are simple and explainable.

TuringBot implements a technique called Symbolic Regression. It tries to combine a set of base functions into simple formulas that accurately predict the desired variable. The base functions offered by the program are the following:

**Arithmetic:**addition, multiplication, division**Trigonometric:**sin, cos, tan, asin, acos, atan**Exponential:**exp, log, log2, sqrt, pow**Hyperbolic:**sinh, cosh, tanh, asinh, acosh, atanh**Logical:**smaller, greater, equal, different, logical_or, logical_and**History:**delay, moving_average**Other:**abs, floor, ceil, round, sign, mod, gamma, erf

What is optimized is the formula itself, and not just the numerical constants of some assumed model.

The program uses TXT or CSV files as input, which may contain an arbitrary number of columns. It can be executed both interactively through its powerful graphical interface, or in an automated way from the command line.

Here is an example of an input file that you can use: input.txt.

If your problem involves predicting a number as a function of other numbers, then you can apply TuringBot to it. Just save the data in TXT or CSV format, load it in the program, and start the search.

To give a few concrete examples:

- Predict the price of a house as a function of its characteristics (area, number of bedrooms, age, etc): A regression model example and how to generate it.
- Detect fraudulent credit card transactions based on anonymized features: Using Symbolic Regression to predict rare events.
- Predict whether a stock will rise or fall in the next day: How to create an AI trading system.

Note that the last two examples are classification problems. This is not an issue: just find formulas that output 0 or 1 depending on the category.

A decision boundary found with symbolic regression. Tutorial

What makes TuringBot so general is that many different search metrics are included, allowing models with different goals to be generated. Those include:

- RMS error
- Classification accuracy
- Correlation coefficient
- Maximum error
- Mean error
- Mean relative error
- F1 score

Both TuringBot and Eureqa are implementations of Symbolic Regression, but the algorithms used by each are completely different. Eureqa is based on genetic programming, while TuringBot is based on Simulated Annealing.

Eureqa was acquired by a consulting company called DataRobot and is no longer commercially available.

A recent paper has shown that TuringBot performs noticeably better than Eureqa on a variety of Physics-inspired problems (arXiv:2010.11328). In this paper, TuringBot even managed to solve problems for which Eureqa could not find a solution at all.

Many free symbolic regression packages have been developed in the past, including notably gplearn but also many small repositories that can be found on GitHub.

When you try any of these packages and compare the performance to TuringBot, you will likely observe that the results are not as good. This is due to two main reasons:

- It is relatively easy to write a basic symbolic regression software, but extremely difficult to create an efficient one. This results in many projects that are started with enthusiasm, maintained for a few months, but eventually abandoned due to limited practical use.
- Most of these packages are written in scripting languages like Python, which do not offer the same level of performance as programs written in C++ from scratch like TuringBot, even when specialized libraries like NumPy and Cython are used.

Using a slow package may make it difficult or impossible to find the formula you are looking for, unless it is relatively straightforward.

TuringBot can be downloaded and used for free for as long as you want, but it also has a paid version that unlocks more functionalities. You can find more details on the Pricing page.

TuringBot's development began in 2019, and version 1.0 of the program was launched to the public in February 2020. Over the last three years, the program has been continually updated in response to user feedback, introducing new features, optimizations, bug fixes, and quality of life improvements.

Some papers that use TuringBot are:

- Ashok D, Scott J, Wetzel S, Panju M and Ganesh V (2020),
*"Logic Guided Genetic Algorithms"*. [URL] - d'Eon E (2021),
*"An analytic BRDF for materials with spherical Lambertian scatterers"*. [URL] - Cornelio C, Dash S, Austel V, Josephson T, Goncalves J, Clarkson K, Megiddo N, Khadir BE and Horesh L (2021),
*"AI Descartes: Combining Data and Theory for Derivable Scientific Discovery"*. [URL] - Li Z, Ji J and Zhang Y (2021),
*"From Kepler to Newton: Explainable AI for Science Discovery"*. [URL] - Simensen J (2021),
*"Study of air exchange and temperature efficiency in rooms--based on parameter variations at supply air valve for use with heated supply air"*(in Norwegian). Thesis at: OsloMet-storbyuniversitetet. [URL] - Al Maruf M, Singh A, Azim A and Auluck N (2021),
*"Faster fog computing based over-the-air vehicular updates: a transfer learning approach"*, IEEE Transactions on Services Computing. IEEE. [URL] - Blackledge J and Lamphiere M (2021),
*"A Review of the Fractal Market Hypothesis for Trading and Market Price Prediction"*, Mathematics. Vol. 10(1), pp. 117. MDPI. [URL] - Knabben FT, Ronzoni AF and Hermes CJ (2021),
*"Effect of the refrigerant charge, expansion restriction, and compressor speed interactions on the energy performance of household refrigerators"*, International Journal of Refrigeration. Vol. 130, pp. 347-355. Elsevier. [URL] - Katinić M, Turk D, Konjatić P and Kozak D (2021),
*"Estimation of C* Integral for Mismatched Welded Compact Tension Specimen"*, Materials. Vol. 14(24), pp. 7491. MDPI. [URL] - Konjatić P, Katinić M, Kozak D and Gubeljak N (2021),
*"Yield Load Solutions for SE (B) Fracture Toughness Specimen with I-Shaped Heterogeneous Weld"*, Materials. Vol. 15(1), pp. 214. MDPI. [URL] - Barbosa FO, Santucci RM, Rossi S, Limberg G, Pérez-Villegas A and Perottoni HD (2022),
*"The SDSS-Gaia View of the Color-Magnitude Relation for Blue Horizontal-Branch Stars"*. [URL] - Alenezi AM and Mohareb M (2022),
*"Elastic compressive buckling resistance for back-to-back double angle assemblies"*, Engineering Structures. Vol. 258, pp. 114120. Elsevier. [URL] - Syed Ahmed Kabir IF, Gajendran MK, Ng E, Mehdizadeh A and Berrouk AS (2022),
*"Novel Machine-Learning-Based Stall Delay Correction Model for Improving Blade Element Momentum Analysis in Wind Turbine Performance Prediction"*, Wind. Vol. 2(4), pp. 636-658. MDPI. [URL] - Eisuke Takeuchi, Yu Tanaka, Hiroe Yoshida, Kazuki Saito, Keisuke Katsura and Tatsuhiko Shiraiwa (2022),
*"Development of a Simple Method for Predicting Rice Harvest Biomass Based on Accumulated Biomass Data"*(in Japanese), In The 254th Lecture Meeting of the Japan Crop Society. , pp. 50-50. [URL] - Mukhtar MF, Abas ZA, Rasib AHA, Anuar SHH, Zaki NHM, Rahman AFNA, Abidin ZZ and Shibghatullah AS (2022),
*"Identifying Influential Nodes with Centrality Indices Combinations using Symbolic Regressions"*, International Journal of Advanced Computer Science and Applications. Vol. 13(5) Science and Information (SAI) Organization Limited. [URL] - Moscato P, Haque MN and Moscato A (2022),
*"Continued fractions and the Thomson Problem"*[URL] - Carreres-Prieto D, García JT, Castillo LG, Carrillo JM and Vigueras-Rodriguez A (2022),
*"Multivariable linear regression versus symbolic regression from genetic programming. Application to the spectroscopic characterization of urban wastewater"*(in Spanish), Ingeniería del Agua. Vol. 26(4), pp. 261-277. [URL] - Nicoluzzi MF and others (2022),
*"Experimental investigation of piston-cylinder gap leakage of reciprocating refrigeration compressors"*(in Portuguese) [URL]

This list is constantly growing and is probably incomplete. If your paper is not shown, please email it to us and we will add it to the list.

We have several resources to help you quickly get up and running with the program.

- YouTube channel: Our YouTube channel is a great place to start. You can find tutorials and other helpful information to assist you in making the most out of the program.
- TuringBot Forum: If you have any questions or need help with a specific task, our forum is available to provide assistance. We welcome any questions or issues you may have, and we look forward to building a friendly and helpful community.

In this example, we use symbolic regression to predict house prices as a function of their characteristics.

Here we use TuringBot to develop a classification algorithm that predicts stock market price changes.

Learn how to run TuringBot in a fully automated and customizable way from Python.

See also: Symbolic Regression: The Forgotten Machine Learning Method

(Towards Data Science).