Symbolic regression example with Python visualization

Symbolic regression is a machine learning technique capable of generating models that are explicit and easy to understand. In this tutorial, we are going to generate our first symbolic regression model using TuringBot, then visualize the results using Python and Matplotlib.

To make things more interesting, we are going to try to find a mathematical formula for the N-th prime number (A000040 in the OEIS).

Symbolic regression setup

The symbolic regression software that we are going to use is called TuringBot. It is a desktop application that runs on both Windows and Linux. The usage is straightforward: you load your input file in .txt or .csv format, select which column should be predicted and which columns should be used as input, and then start the search.

Several search metrics are available, including RMS error, mean error, correlation coefficient, and others. Since we are interested in predicting the exact values of the prime numbers, we are going to use the "classification accuracy" metric.

This is what the interface looks like after loading the input file containing prime numbers as a function of N, which we have truncated to the first 20 rows:

The TuringBot interface.

With the input file loaded and the search metric selected, the search can be started by clicking on the play button at the top of the interface.

The formulas that were found

After letting TuringBot work for a few minutes, these were the formulas that it ended up finding:

The results of our symbolic regression optimization.

The best one has an error of 0.20, that is, a classification accuracy of 80%. Which is quite impressive considering how short the formula is. Of course, we could have obtained a 100% accuracy with a huge polynomial, but that would not compress the data in any way, since the number of free parameters in the resulting model would be the same as the number of data points.

Visualizing with Python

Now we can finally visualize the symbolic model using Python. Luckily the formula works out of the box as long as we import the math library (TuringBot follows the same naming convention). This is what the script looks like:

import numpy as np
import matplotlib.pyplot as plt
from math import floor, ceil, cosh, log2

def prime(x):
    return floor(1.98623 * ceil(0.0987775 + cosh(log2(x) - 0.049869)) - (1 / x))

# Load data from 'primes.txt'
data = np.loadtxt('primes.txt')

# Scatter plot of the data
plt.scatter(data[:, 0], data[:, 1], label='Data')

# Plot the model based on the prime function
plt.plot(data[:, 0], [prime(x) for x in data[:, 0]], label='Model')

# Add labels and title
plt.xlabel('N')
plt.ylabel('Prime Values')
plt.title('Prime Numbers')
plt.legend()

# Show the plot
plt.show()

And this is the resulting plot:

Plot of our model vs the original data.

Conclusion

In this tutorial, we have seen how to generate a symbolic regression model and visualize it with Python. The example given was a very simple one, with only one input variable and a small number of data points, but the methodology works just as well with real-world large datasets with multiple dimensions.

A key advantage of using TuringBot for symbolic regression is that the formulas it generates work directly in Python without any conversion—just import the math library and you're ready to go. This makes it easy to integrate the discovered formulas into your existing Python workflows, whether for data analysis, scientific computing, or machine learning pipelines.

About TuringBot

TuringBot finds mathematical formulas from data using symbolic regression. Load a CSV, select your target variable, and get interpretable equations—not black-box models.

Free version available for Windows, macOS, and Linux.