Python: Symbolic Regression in 3 Easy Steps

Looking for a Symbolic Regression library for Python that will allow you to turn your data into nice mathematical formulas? TuringBot is by far the easiest to use. Here we will show how to use it.

Step #1: Download TuringBot

Contrary to most Python libraries, which are distributed through PyPI, TuringBot is distributed as a standalone application. Go ahead and download it from the website. It has versions for both Windows and Linux.

The program also has a nice user interface, but in this case, we are not going to use it, just the Python library that comes with the program.

TuringBot’s graphical interface for Symbolic Regression.

Step #2: Import TuringBot

Once you have the program installed, import it in Python with the following syntax, making sure to replace “user” with your local username:

import sys 
sys.path.insert(1, r'C:\Users\user\AppData\Local\Programs\TuringBot') 

import turingbot as tb 

If you are in Linux, you can equivalently use:

import sys 

sys.path.insert(1, '/usr/share/turingbot') 
import turingbot as tb 

After that, TuringBot will be imported and ready to go.

Step #3: Start the Symbolic Regression search

The optimization is started like this:

sim = tb.simulation() 
sim.start_process(path, input_file, threads=4, config=config_file) 

The 4 parameters that you see are:

  • path: the path to the TuringBot executable.

  • input_file: the path to your input file, which must contain one variable per column.

  • threads (optional): the number of threads that the program should use.

  • config (optional): the path to the configuration file.

For instance, if you are on Windows, the paths would look something like this:

path = r'C:\Users\user\AppData\Local\Programs\TuringBot\TuringBot.exe' 
input_file = r'C:\Users\user\Desktop\input.txt' 
config_file = r'C:\Users\user\Desktop\settings.cfg' 

And on Linux:

path = r'/usr/bin/turingbot' 
input_file = r'/home/user/input.txt' 
config_file = r'/home/user/settings.cfg' 

Once you run the start_process() method, the optimization will start in the background. You can refresh the current functions in real-time with sim.refresh_functions():

sim.refresh_functions() 
print(*sim.functions, sep='\n') 
print(sim.info) 

The output will look something like this, with the size of the solution in the first column, the error in the second one, and finally the solution itself:

[1, 177813.0, '186276']
[3, 7890.39, '11.7503*x']
[5, 6895.25, '11.9394*(-472.889+x)']
[7, 1769.0, '(10.4154+3.9908e-05*x)*x']
[11, 1666.42, '(9.10666+3.26179e-05*x)*(1.156*(-93.3986+x))']
[21, 1224.31, '-1624.3+((9.18774*sign(x-10.1264)+3.13847e-05*x)*(1.1586*(-158.606+x)))']

Tip: Customizing your search

By default, the last column of your input file will be the target variable, and all other columns will be used as input variables.

But you can change that as well as several other options by providing the program with a configuration file, that looks like this:

search_metric = 4 # Search metric. 1: Mean relative error, 2: Classification accuracy, 3: Mean error, 4: RMS error, 5:, F1 score, 6: Correlation coefficient, 7: Hybrid (CC+RMS), 8: Maximum error, 9: Maximum relative error, 10: Nash-Sutcliffe efficiency 
train_test_split = -1 # Train/test split. -1: No cross validation. Valid options are: 50, 60, 70, 75, 80 
test_sample = 1 # Test sample. 1: Chosen randomly, 2: The last points 
integer_constants = 0 # Integer constants only. 0: Disabled, 1: Enabled 
bound_search_mode = 0 # Bound search mode. 0: Deactivated, 1: Lower bound search, 2: Upper bound search 
maximum_formula_complexity = 60 # Maximum formula complexity. 
history_size = 20 # History size. 
allow_target_delay = 1 # Allow the target variable in the history functions? 0: No, 1: Yes 
custom_formula =  # Custom formula for the search. If empty, the program will try to find the last column as a function of the remaining ones. 
allowed_functions = + * / pow fmod sin cos tan asin acos atan exp log log2 sqrt sinh cosh tanh asinh acosh atanh abs floor ceil round tgamma lgamma erf # Allowed functions.

The definitions of those settings can be consulted on the Official Documentation.

About TuringBot

TuringBot is a desktop software for Symbolic Regression. By feeding your data in .TXT or .CSV format into the program, you can immediately start searching for mathematical formulas that connect the variables. If you want to learn more about what TuringBot can offer you, please visit our homepage.