Classification using symbolic regression

Classification problems are widespread and of great practical importance. In this article, we will show how symbolic regression can be used to solve this kind of problem.

Classification problems

As of 2021, the most commonly used machine learning method for classification are likely neural networks and their variants, including for instance convolutional neural networks (CNN) for image classification.

Numerically, the problem is quite simple. The goal is to have a mathematical operator that returns a different integer number based on the category:

f(input variables) = 0 or 1 or 2

Symbolic regression: how to use it for classification?

Since the output of classification model needs to be an integer number, a custom symbolic regression search must be employed.

One way to ensure that the formula will return integer numbers only is to search for f(x) such that y = round(f(x)):

Custom symbolic regression search in TuringBot.

This way, the symbolic regression search will not have to waste time on functions that return floating-point values.

Which error metric to use?

For meaningful optimization in the context of a classification problem, two main search metrics should be used: Classification accuracy or F1 score.

Error metrics available in TuringBot.

If the categories are more or less equally likely, classification accuracy is the best choice. But if the category is 0 on nearly all cases and 1 only on for a few exceptions, F1 score is a more appropriate choice.

What if I have several categories?

This is not a problem. Just represent the different categories as different integer numbers: 1 for the first category, 2 for the second one, and so on.

In practice, you are going to have to prepare an input file that will look like this:

input1 input2 input3 classification
0.10456 0.86774 0.64066 1
0.04537 0.23525 0.60929 3
0.01393 0.74641 0.28997 1
0.26839 0.20967 0.04693 2
0.02682 0.90024 0.65051 3

About TuringBot

TuringBot is a desktop software for Symbolic Regression. By feeding your data in .TXT or .CSV format into the program, you can immediately start searching for mathematical formulas that connect the variables. If you want to learn more about what TuringBot can offer you, please visit our homepage.