### A regression model example and how to generate it

In this example, we use symbolic regression to predict house prices as a function of their characteristics.

When it comes to predicting numerical values from a set of input variables, there are typically two common approaches:

- Fitting a linear or polynomial model to the data.
- Using complex machine learning algorithms, such as neural networks.

While these methods have their strengths, they also have limitations. Linear and polynomial models can only capture simple relationships, while complex algorithms are susceptible to overfitting and do not offer much insight into the data.

This is where TuringBot comes in: it solves the problem by finding explicit mathematical formulas that connect the variables. This way, it generalizes curve-fitting methods (including linear and polynomial regression), while generating models that are simple and interpretable.

TuringBot implements a technique called Symbolic Regression. It tries to combine a set of base functions into simple formulas that accurately predict the desired variable. The base functions offered by the program are the following:

**Arithmetic:**addition, multiplication, division**Trigonometric:**sin, cos, tan, asin, acos, atan**Exponential:**exp, log, log2, sqrt, pow**Hyperbolic:**sinh, cosh, tanh, asinh, acosh, atanh**Logical:**smaller, greater, equal, different, logical_or, logical_and**History:**delay, moving_average**Other:**abs, floor, ceil, round, sign, mod, gamma, erf

What is optimized is the formula itself, and not just the numerical constants of some assumed model.

The program uses TXT or CSV files as input, which may contain an arbitrary number of columns. It can be executed both interactively through its powerful graphical interface or in an automated way from the command line.

Here is an example of an input file that you can use: input.txt.

If your problem involves predicting a number as a function of other numbers, then you can apply TuringBot to it. Just save the data in TXT or CSV format, load it in the program, and start the search.

To give a few concrete examples:

- Predict the price of a house as a function of its characteristics (area, number of bedrooms, age, etc): A regression model example and how to generate it.
- Detect fraudulent credit card transactions based on anonymized features: Using Symbolic Regression to predict rare events.
- Predict whether a stock will rise or fall the next day: How to create an AI trading system.

Note that the last two examples are classification problems. This is not an issue: just find formulas that output 0 or 1 depending on the category.

A decision boundary found with symbolic regression. Tutorial

What makes TuringBot so general is that many different search metrics are included, allowing models with different goals to be generated. Those include:

- RMS error
- Classification accuracy
- Correlation coefficient
- Maximum error
- Mean error
- Mean relative error
- F1 score

**High-performance**: TuringBot is written in C++ from scratch, providing a significant speed advantage over packages written in scripting languages like Python. This enables you to find the formula you're looking for with maximum efficiency.**Easy setup**: Unlike many symbolic regression packages, TuringBot is a simple executable that you can install in a few minutes. No Python dependencies, no Docker, no Conda, no virtual environments. This makes it easy to get started and focus on your work.**Proven track record**: Our algorithm has been successfully employed in academic publications across a wide range of fields (see below).**Active development**: TuringBot's development began in 2019, and version 1.0 of the program was launched to the public in February 2020. Over the last four years, the program has been continually updated in response to user feedback, introducing new features, optimizations, bug fixes, and quality-of-life improvements.

Both TuringBot and Eureqa are implementations of Symbolic Regression, but the algorithms used by each are completely different. Eureqa is based on genetic programming, while TuringBot is based on Simulated Annealing.

Eureqa was acquired by a consulting company called DataRobot and is no longer commercially available.

A 2020 paper has shown that TuringBot performs noticeably better than Eureqa on a variety of Physics-inspired problems (arXiv:2010.11328). In this paper, TuringBot even managed to solve problems for which Eureqa could not find a solution at all.

TuringBot can be downloaded and used for free for as long as you want, but it also has a paid version that unlocks more functionalities. You can find more details on the Pricing page.

Some publications that use TuringBot are:

- Cornelio, C., Dash, S., Austel, V., Josephson, T., Goncalves, J., Clarkson, K., ... & Horesh, L. (2021).
*AI Descartes: Combining data and theory for derivable scientific discovery*. arXiv preprint arXiv:2109.01634. [URL] - Li, Z., Ji, J., & Zhang, Y. (2021).
*From Kepler to newton: explainable AI for science*. arXiv preprint arXiv:2111.12210. [URL] - Simensen, J. (2021). Study of air exchange and temperature efficiency in a room – based on parameter variations at the supply air vent for use with heated supply air (Master's thesis, OsloMet-storbyuniversitetet). [URL, in Norwegian]
- Ashok, D., Scott, J., Wetzel, S. J., Panju, M., & Ganesh, V. (2021, May).
*Logic guided genetic algorithms*(student abstract). In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 35, No. 18, pp. 15753-15754). [URL] - d'Eon, E. (2021, July).
*An analytic BRDF for materials with spherical Lambertian scatterers*. In Computer Graphics Forum (Vol. 40, No. 4, pp. 153-161). [URL] - Katinić, M., Turk, D., Konjatić, P., & Kozak, D. (2021).
*Estimation of c* Integral for mismatched welded compact tension specimen*. Materials, 14(24), 7491. [URL] - Konjatić, P., Katinić, M., Kozak, D., & Gubeljak, N. (2021).
*Yield Load Solutions for SE (B) Fracture Toughness Specimen with I-Shaped Heterogeneous Weld*. Materials, 15(1), 214. [URL] - Blackledge, J., & Lamphiere, M. (2021).
*A review of the fractal market hypothesis for trading and market price prediction*. Mathematics, 10(1), 117. [URL] - Knabben, F. T., Ronzoni, A. F., & Hermes, C. J. (2021).
*Effect of the refrigerant charge, expansion restriction, and compressor speed interactions on the energy performance of household refrigerators*. International Journal of Refrigeration, 130, 347-355. [URL] - Zhu, J., Zhao, S., Wang, L., Xu, Y., & Yan, L. Q. (2022).
*Practical level-of-detail aggregation of fur appearance*. ACM Transactions on Graphics (TOG), 41(4), 1-17. [URL] - Barbosa, F. O., Santucci, R. M., Rossi, S., Limberg, G., Pérez-Villegas, A., & Perottoni, H. D. (2022).
*The SDSS-Gaia View of the Color–Magnitude Relation for Blue Horizontal-branch Stars*. The Astrophysical Journal, 940(1), 30. [URL] - Costa, L. A., & de Sousa, J. R. M. (2022).
*Semi-empirical equation for determination of stress concentration factors (SCF) in tubular joints of fixed offshore platforms subjected to axial forces*. In XLIII Ibero-Latin American Congress on Computational Methods in Engineering (Vol. 4, No. 04). [URL] - Alenezi, A. M. M. (2022).
*Buckling Resistance of Single and Double Angle Compression Members*(Doctoral dissertation, Université d'Ottawa/University of Ottawa). [URL] - Al-Subhi, A. (2022).
*Dynamic Economic Load Dispatch Using Linear Programming and Mathematical-Based Models*. Mathematical Modelling of Engineering Problems, 9(3). [URL] - Mukhtar, M. F., Abas, Z. A., Rasib, A. H. A., Anuar, S. H. H., Zaki, N. H. M., Rahman, A. F. N. A., ... & Shibghatullah, A. S. (2022).
*Identifying influential nodes with centrality indices combinations using symbolic regressions*. International Journal of Advanced Computer Science and Applications, 13(5). [URL] - Takeuchi Eisuke, Tanaka Yu, Yoshida Hiroe, Saito Kazuki, Katsura Keisuke, & Shiraiwa Tachihiko. (2022, September).
*Development of a Simple Method for Predicting Rice Biomass at Harvest Based on Biomass Accumulation Data*. In Proceedings of the 254th Japanese Society of Crop Science Conference (pp. 50-50). Japanese Society of Crop Science. [URL, in Japanese] - Agboka, K. M., Tonnang, H. E., Abdel-Rahman, E. M., Odindi, J., Mutanga, O., & Niassy, S. (2022).
*Data-driven artificial intelligence (AI) algorithms for modelling potential maize yield under maize–legume farming systems in East Africa*. Agronomy, 12(12), 3085. [URL] - Syed Ahmed Kabir, I. F., Gajendran, M. K., Ng, E. Y. K., Mehdizadeh, A., & Berrouk, A. S. (2022).
*Novel machine-learning-based stall delay correction model for improving blade element momentum analysis in wind turbine performance prediction*. Wind, 2(4), 636-658. [URL] - Lai, D., Demartino, C., & Xiao, Y. (2022).
*High-strain rate compressive behavior of fiber-reinforced rubberized concrete*. Construction and Building Materials, 319, 125739. [URL] - Moscato, P., & Grebogi, R. (2023, July).
*Approximating the Boundaries of Unstable Nuclei Using Analytic Continued Fractions*. In Proceedings of the Companion Conference on Genetic and Evolutionary Computation (pp. 751-754). [URL] - Carvalho, A., Oliveira, D. M., Krone-Martins, A., & Da Silva, A. (2023, October).
*Symbolic Regression Applied to Cosmology: An Approximate Expression for the Density Perturbation Variance*. In 2023 IEEE 19th International Conference on e-Science (e-Science) (pp. 1-2). IEEE. [URL] - Lakshmi, J. R., & Kumar, J. V. V. (2023, September).
*A Regression-Based Approach for Assessing the Buckling Coefficient of Stiffened and Unstiffened Elements*. In IOP Conference Series: Earth and Environmental Science (Vol. 1237, No. 1, p. 012010). IOP Publishing. [URL] - Gajendran, M. K. (2023).
*Machine learning based predictive modeling of stochastic systems*. University of Missouri-Kansas City. [URL] - Howard, D. A., Jørgensen, B. N., & Ma, Z. (2023).
*Multi-Method Simulation and Multi-Objective Optimization for Energy-Flexibility-Potential Assessment of Food-Production Process Cooling*. Energies, 16(3), 1514. [URL] - Karakašić, M., Konjatić, P., Glavaš, H., & Grgić, I. (2023).
*Influence of Open Differential Design on the Mass Reduction Function*. Applied Sciences, 13(24), 13300. [URL] - Gajendran, M. K., Kabir, I. F. S. A., Vadivelu, S., & Ng, E. Y. K. (2023).
*Machine learning-based approach to wind turbine wake prediction under yawed conditions*. Journal of Marine Science and Engineering, 11(11), 2111. [URL] - Moscato, P., Haque, M. N., & Moscato, A. (2023).
*Continued fractions and the Thomson problem*. Scientific Reports, 13(1), 7272. [URL] - Barbosa, F. O. (2023).
*Galactic Archaeology through the Blue Stars of the Horizontal Branch*(Doctoral dissertation, Universidade de São Paulo). [URL, in Portuguese] - Pak, A., & Trinh, K. (2023).
*Forecasting Emergency Department Waiting Times Using Deep Neural Networks*. Value in Health, 26(12), S10. [URL] - Liljendahl, M., Torpet, M., Lyngsie, P. J., Rudolfsen, J. H., Pedersen, M., & Ibler, K. S. (2023).
*Machine Learning to Identify Atopic Dermatitis Prevalence Using Healthcare Utilisation Patterns of Both Diagnosed and Non-Diagnosed AD Patients Based on Danish Register Data*. Value in Health, 26(12), S10. [URL] - Shaikh, S. A., Taufique, M. F. N., Balusu, K., Kulkarni, S. S., Hale, F., Oleson, J., ... & Soulami, A. (2024).
*Finite Element Analysis and Machine Learning Guided Design of Carbon Fiber Organosheet-Based Battery Enclosures for Crashworthiness*. Applied Composite Materials, 1-19. [URL] - Gude, P., Geldermann, N., Gustedt, F., Grobe, C., Weber, T. P., & Georgevici, A. I. (2024).
*New postoperative pain instrument for toddlers—Secondary analysis of prospectively collected assessments after tonsil surgery*. Pediatric Anesthesia, 34(4), 347-353. [URL] - Denton, R. E., Tengdin, P. M., Hartley, D. P., Goldstein, J., Lee, J., & Takahashi, K. (2024).
*The electron density at the midpoint of the plasmapause*. Frontiers in Astronomy and Space Sciences, 11, 1376073. [URL] - Moscato, P., & Haque, M. N. (2024).
*New alternatives to the Lennard-Jones potential*. Scientific Reports, 14(1), 11169. [URL] - Moscato, P., & Grebogi, R. (2024).
*Approximating the nuclear binding energy using analytic continued fractions*. Scientific Reports, 14(1), 11559. [URL] - Seyam, S., Dincer, I., & Agelin-Chaab, M. (2024).
*Optimization and Comparative Evaluation of Novel Marine Engines Integrated with Fuel Cells Using Sustainable Fuel Choices*. Energy, 131629. [URL]

This list is constantly growing and is probably incomplete. If your paper is not shown, please email it to us and we will add it to the list.

We have several resources to help you quickly get up and running with the program.

- YouTube channel: Our YouTube channel is a great place to start. You can find tutorials and other helpful information to assist you in making the most out of the program.
- TuringBot Forum: If you have any questions or need help with a specific task, our forum is available to provide assistance. We welcome any questions or issues you may have, and we look forward to building a friendly and helpful community.

In this example, we use symbolic regression to predict house prices as a function of their characteristics.

Here we use TuringBot to develop a classification algorithm that predicts stock market price changes.

Learn how to run TuringBot in a fully automated and customizable way from Python.

See also: Symbolic Regression: The Forgotten Machine Learning Method

(Towards Data Science).