Symbolic Regression in Python with TuringBot

In this tutorial, we are going to show a very easy way to do symbolic regression in Python.

For that, we are going to use the symbolic regression software TuringBot. This program runs on both Windows and Linux, and it comes with a handy Python library. You can download it for free from the official website.

Importing TuringBot

The first step in running our symbolic regression optimization in Python is importing TuringBot. For that, all you have to do is add its installation directory to your Python PATH and import it, as so:

Windows
import sys 
sys.path.insert(1, r'C:\Program Files (x86)\TuringBot') 

import turingbot as tb 
Linux
import sys 

sys.path.insert(1, '/usr/share/turingbot') 
import turingbot as tb 

Running the optimization

The turingbot library implements a simulation object that can be used to start, stop and get the current status of a symbolic regression optimization.

This is how it works:

Windows
path = r'C:\Program Files (x86)\TuringBot\TuringBot.exe' 
input_file = r'C:\Users\user\Desktop\input.txt' 
config_file = r'C:\Users\user\Desktop\settings.cfg' 

sim = tb.simulation() 
sim.start_process(path, input_file, threads=4, config=config_file) 
Linux
path = r'/usr/bin/turingbot' 
input_file = r'/home/user/input.txt' 
config_file = r'/home/user/settings.cfg' 

sim = tb.simulation() 
sim.start_process(path, input_file, threads=4, config=config_file) 

The start_process method starts the optimization in the background. It takes as input the paths to the TuringBot executable and to your input file. Optionally, you can also set the number of threads that the program should use and the path to the configuration file (more on that below).

After running the commands above, nothing will happen because the optimization will start in the background. To retrieve and print the current best formulas, you should use:

sim.refresh_functions() 
print(*sim.functions, sep='\n') 
print(sim.info) 

To stop the optimization and kill the TuringBot process, you should use the terminate_process method:

sim.terminate_process()

Using a configuration file

We have seen above that the start_process method may take the path to a configuration file as an optional input parameter. This is what the file should look like:

search_metric = 4 # Search metric. 1: Mean relative error, 2: Classification accuracy, 3: Mean error, 4: RMS error, 5:, F1 score, 6: Correlation coefficient, 7: Hybrid (CC+RMS), 8: Maximum error, 9: Maximum relative error, 10: Nash-Sutcliffe efficiency 
train_test_split = -1 # Train/test split. -1: No cross-validation. Valid options are: 50, 60, 70, 75, 80 
test_sample = 1 # Test sample. 1: Chosen randomly, 2: The last points 
integer_constants = 0 # Integer constants only. 0: Disabled, 1: Enabled 
bound_search_mode = 0 # Bound search mode. 0: Deactivated, 1: Lower bound search, 2: Upper bound search 
maximum_formula_complexity = 60 # Maximum formula complexity. 
history_size = 20 # History size. 
allow_target_delay = 1 # Allow the target variable in the history functions? 0: No, 1: Yes 
custom_formula =  # Custom formula for the search. If empty, the program will try to find the last column as a function of the remaining ones. 
allowed_functions = + * / pow fmod sin cos tan asin acos atan exp log log2 sqrt sinh cosh tanh asinh acosh atanh abs floor ceil round tgamma lgamma erf # Allowed functions.

The comments after the # characters are for your convenience and are ignored. To change the search settings, all you have to do is change the numbers in each line. To change the base functions for the search, just add or delete their names from the last line.

Save the contents of the file above to a settings.cfg file and add the path of this file to the start_process method before calling it if you want to customize your search.

Full example

Here are the full source codes of the examples that we have provided above. Note that you have to replace user in the paths to your local username and that you have to create an input file (txt or csv format, one number per column) to use with the program.

Windows
import sys 
sys.path.insert(1, r'C:\Users\user\AppData\Local\Programs\TuringBot') 

import turingbot as tb 
import time

path = r'C:\Users\user\AppData\Local\Programs\TuringBot\TuringBot.exe' 
input_file = r'C:\Users\user\Desktop\input.txt' 
config_file = r'C:\Users\user\Desktop\settings.cfg' 

sim = tb.simulation() 
sim.start_process(path, input_file, threads=4, config=config_file) 

time.sleep(10)

sim.refresh_functions()
print(*sim.functions, sep='\n')
print(sim.info)

sim.terminate_process()
Linux
import sys 

sys.path.insert(1, '/usr/share/turingbot') 
import turingbot as tb 

import time 

path = r'/usr/bin/turingbot' 
input_file = r'/home/user/input.txt' 
config_file = r'/home/user/settings.cfg' 

sim = tb.simulation() 
sim.start_process(path, input_file, threads=4, config=config_file) 

time.sleep(10) 

sim.refresh_functions() 
print(*sim.functions, sep='\n') 
print(sim.info) 

sim.terminate_process()

About TuringBot

TuringBot is a desktop software for Symbolic Regression. By feeding your data in .TXT or .CSV format into the program, you can immediately start searching for mathematical formulas that connect the variables. If you want to learn more about what TuringBot can offer you, please visit our homepage.