Symbolic Regression in Python with TuringBot

In this tutorial, we are going to show a very easy way to do symbolic regression in Python.

For that, we are going to use the symbolic regression software TuringBot. This program runs on both Windows and Linux, and it comes with a handy Python library. You can download it for free from the official website.

Importing TuringBot

The first step in running our symbolic regression optimization in Python is importing TuringBot. For that, all you have to do is add its installation directory to your Python PATH and import it, as so:

Windows
import sys
sys.path.insert(1, r'C:\Program Files (x86)\TuringBot\resources')

import turingbot as tb
Linux
import sys
sys.path.insert(1, r'/usr/lib/turingbot/resources')

import turingbot as tb

Running the optimization

The turingbot library implements a simulation object that can be used to start, stop and get the current status of a symbolic regression optimization.

This is how it works:

Windows
path = r'C:\Program Files (x86)\TuringBot\TuringBot.exe'
input_file = r'C:\Users\user\Desktop\input.txt' 
config_file = r'C:\Users\user\Desktop\settings.cfg' 

sim = tb.simulation() 
sim.start_process(path, input_file, threads=4, config=config_file) 
Linux
path = r'/usr/lib/turingbot/TuringBot'
input_file = r'/home/user/input.txt' 
config_file = r'/home/user/settings.cfg' 

sim = tb.simulation() 
sim.start_process(path, input_file, threads=4, config=config_file) 

The start_process method starts the optimization in the background. It takes as input the paths to the TuringBot executable and to your input file. Optionally, you can also set the number of threads that the program should use and the path to the configuration file (more on that below).

After running the commands above, nothing will happen because the optimization will start in the background. To retrieve and print the current best formulas, you should use:

sim.refresh_functions() 
print(*sim.functions, sep='\n') 
print(sim.info) 

To stop the optimization and kill the TuringBot process, you should use the terminate_process method:

sim.terminate_process()

Using a configuration file

We have seen above that the start_process method may take the path to a configuration file as an optional input parameter. This is what the file should look like:

search_metric = 4 # Search metric. 1: Mean relative error, 2: Classification accuracy, 3: Mean error, 4: RMS error, 5:, F-score, 6: Correlation coefficient, 7: Hybrid (CC+RMS), 8: Maximum error, 9: Maximum relative error, 10: Nash-Sutcliffe efficiency, 11: Binary cross-entropy, 12: Matthews correlation coefficient (MCC)
train_test_split = -1 # Train/test split. -1: No cross-validation. Valid options are: 50, 60, 70, 75, 80, 100, 1000, 10000, 100000
test_sample = 1 # Test sample. 1: Chosen randomly, 2: The last points
train_test_seed = -1 # Random seed for train/test split generation when the test sample is chosen randomly.
integer_constants = 0 # Integer constants only. 0: Disabled, 1: Enabled
bound_search_mode = 0 # Bound search mode. 0: Deactivated, 1: Lower bound search, 2: Upper bound search
maximum_formula_complexity = 60 # Maximum formula complexity.
history_size = 20 # History size.
fscore_beta = 1 # F-score beta parameter.
normalize_dataset = 0 # Normalize the dataset before starting the optimization? 0: No, 1: Yes
allow_target_delay = 0 # Allow the target variable in the lag functions? 0: No, 1: Yes
force_all_variables = 0 # Force solutions to include all input variables? 0: No, 1: Yes
custom_formula =  # Custom formula for the search. If empty, the program will try to find the last column as a function of the remaining ones.
allowed_functions = + * / pow fmod sin cos tan asin acos atan exp log log2 log10 sqrt sinh cosh tanh asinh acosh atanh abs floor ceil round sign tgamma lgamma erf # Allowed functions.

The comments after the # characters are for your convenience and are ignored. To change the search settings, all you have to do is change the numbers in each line. To change the base functions for the search, just add or delete their names from the last line.

Save the contents of the file above to a settings.cfg file and add the path of this file to the start_process method before calling it if you want to customize your search.

Full example

Here are the full source codes of the examples that we have provided above. Note that you have to replace user in the paths to your local username and that you have to create an input file (txt or csv format, one number per column) to use with the program.

Windows
import sys
sys.path.insert(1, r'C:\Program Files (x86)\TuringBot\resources')

import time
import turingbot as tb

path = r'C:\Program Files (x86)\TuringBot\TuringBot.exe'
input_file = r'/home/user/input.txt'
config_file = r'/home/user/settings.cfg'

sim = tb.simulation()
sim.start_process(path, input_file, threads=4, config=config_file)

time.sleep(10)

sim.refresh_functions()
print(*sim.functions, sep='\n')
print(sim.info)

sim.terminate_process()
Linux
import sys
sys.path.insert(1, r'/usr/lib/turingbot/resources')

import time
import turingbot as tb

path = r'/usr/lib/turingbot/TuringBot'
input_file = r'/home/user/input.txt'
config_file = r'/home/user/settings.cfg'

sim = tb.simulation()
sim.start_process(path, input_file, threads=4, config=config_file)

time.sleep(10)

sim.refresh_functions()
print(*sim.functions, sep='\n')
print(sim.info)

sim.terminate_process()

About TuringBot

TuringBot is a desktop software for Symbolic Regression. By feeding your data in .TXT or .CSV format into the program, you can immediately start searching for mathematical formulas that connect the variables. If you want to learn more about what TuringBot can offer you, please visit our homepage.