Python: Symbolic Regression in 3 Easy Steps

Key Takeaways:
  • TuringBot provides Python bindings with zero pip dependencies—the core engine is compiled C++
  • Call TuringBot from Python scripts while getting native C++ performance
  • No Julia runtime required (unlike PySR), no scikit-learn dependency chain (unlike gplearn)
  • Results stream in real-time; access partial solutions while search continues

Why TuringBot Instead of Pure Python Libraries?

AspectTuringBot + PythonPySRgplearn
SetupSimple installerpip + Julia installpip + sklearn
EngineCompiled C++Julia JITPython/NumPy
GUI optionYesNoNo
Error metrics15 built-inMSE, MAE + customLimited

Step #1: Download TuringBot

Download TuringBot for Windows or Linux. It's a standalone binary—no pip install, no dependency conflicts.

TuringBot’s graphical interface for Symbolic Regression.

Step #2: Import TuringBot

Once you have the program installed, import it in Python with the following syntax, making sure to replace “user” with your local username:

import sys
sys.path.insert(1, r'C:\Program Files (x86)\TuringBot\resources')

import turingbot as tb

If you are in Linux, you can equivalently use:

import sys
sys.path.insert(1, r'/usr/lib/turingbot/resources')

import turingbot as tb

After that, TuringBot will be imported and ready to go.

Step #3: Start the Symbolic Regression search

The optimization is started like this:

sim = tb.simulation() 
sim.start_process(path, input_file, threads=4, config=config_file) 

The 4 parameters that you see are:

  • path: the path to the TuringBot executable.
  • input_file: the path to your input file, which must contain one variable per column.
  • threads (optional): the number of threads that the program should use.
  • config (optional): the path to the configuration file.

For instance, if you are on Windows, the paths would look something like this:

path = r'C:\Program Files (x86)\TuringBot\TuringBot.exe'
input_file = r'C:\Users\user\Desktop\input.txt' 
config_file = r'C:\Users\user\Desktop\settings.cfg' 

And on Linux:

path = r'/usr/lib/turingbot/TuringBot'
input_file = r'/home/user/input.txt' 
config_file = r'/home/user/settings.cfg' 

Once you run the start_process() method, the optimization will start in the background. You can refresh the current functions in real-time with sim.refresh_functions():

sim.refresh_functions() 
print(*sim.functions, sep='\n') 
print(sim.info) 

Output format: [complexity, error, formula]

[1, 177813.0, '186276']                    # Constant baseline
[3, 7890.39, '11.7503*x']                  # Linear: 95.6% error reduction
[5, 6895.25, '11.9394*(-472.889+x)']       # Affine
[7, 1769.0, '(10.4154+3.9908e-05*x)*x']    # Quadratic: 99.0% error reduction
[11, 1666.42, '(9.10666+3.26179e-05*x)*(1.156*(-93.3986+x))']
[21, 1224.31, '-1624.3+((9.18774*sign(x-10.1264)+3.13847e-05*x)*(1.1586*(-158.606+x)))']

This Pareto front shows the accuracy/complexity tradeoff—pick the formula that fits your needs.

Tip: Customizing your search

By default, the last column of your input file will be the target variable, and all other columns will be used as input variables.

But you can change that as well as several other options by providing the program with a configuration file, that looks like this:

search_metric = 4 # Search metric. 1: Mean relative error, 2: Classification accuracy, 3: Mean error, 4: RMS error, 5: F-score, 6: Correlation coefficient, 7: Hybrid (CC+RMS), 8: Maximum error, 9: Maximum relative error, 10: Nash-Sutcliffe efficiency, 11: Binary cross-entropy, 12: Matthews correlation coefficient (MCC), 13: Residual sum of squares (RSS), 14: Root mean squared log error (RMSLE), 15: Percentile error
train_test_split = -1 # Train/test split. -1: No cross-validation. Valid options are: 50, 60, 70, 75, 80, 100, 1000, 10000, 100000
test_sample = 1 # Test sample. 1: Chosen randomly, 2: The last points
train_test_seed = -1 # Random seed for train/test split generation when the test sample is chosen randomly.
integer_constants = 0 # Integer constants only. 0: Disabled, 1: Enabled
bound_search_mode = 0 # Bound search mode. 0: Deactivated, 1: Lower bound search, 2: Upper bound search
maximum_formula_complexity = 60 # Maximum formula complexity.
history_size = 20 # History size.
fscore_beta = 1 # F-score beta parameter.
normalize_dataset = 0 # Normalize the dataset before starting the optimization? 0: No, 1: Yes
allow_target_delay = 0 # Allow the target variable in the lag functions? 0: No, 1: Yes
force_all_variables = 0 # Force solutions to include all input variables? 0: No, 1: Yes
custom_formula =  # Custom formula for the search. If empty, the program will try to find the last column as a function of the remaining ones.
allowed_functions = + * / pow fmod sin cos tan asin acos atan exp log log2 log10 sqrt sinh cosh tanh asinh acosh atanh abs floor ceil round sign tgamma lgamma erf # Allowed functions.

The definitions of those settings can be consulted on the Official Documentation.

About TuringBot

TuringBot finds mathematical formulas from data using symbolic regression. Load a CSV, select your target variable, and get interpretable equations—not black-box models.

Free version available for Windows, macOS, and Linux.