Decision boundary discovery with symbolic regression

An interesting classification problem is trying to find a decision boundary that separates two categories of points. For instance, consider the following cloud of points:

Clearly, we could hand draw a line that separates the two colors. But can this problem be solved in an automatic way?

Several machine learning methods could be used for this, including for instance a Support Vector Machine or AdaBoost. What all of these methods have in common is that they perform complex calculations under the hood and spill out some number, that is, they are black boxes. An interesting comparison of several of these methods can be found here.

A simpler and more elegant alternative is to try to find an explicit mathematical formula that separates the two categories. Not only would this be easier to compute, but it would also offer some insight into the data. This is where symbolic regression comes in.

Symbolic regression

The way to solve this problem with symbolic regression is to look for a formula that returns 0 for points of one category and 1 for points of another. That is, a formula for classification = f(x, y).

We can look for that formula by generating a CSV file with our points and loading it into TuringBot. Then we can run the optimization with classification accuracy as the search metric.

If we do that, the program ends up finding a simple formula with an accuracy of 100%:

classification = ceil(-1*tanh(round(x*y-cos((-2)*(y-x)))))

To visualize the decision boundary associated with this formula, we can generate some random points and keep track of the ones classified as orange. Then we can find the alpha shape that encompasses those points, which will be the decision boundary:

import alphashape
from descartes import PolygonPatch
import numpy as np
from math import *

def f(x, y):
    return ceil(-1*tanh(round(x*y-cos((-2)*(y-x)))))

pts = []
for i in range(10000):
    x = np.random.random()*2-1
    y = np.random.random()*2-1
    if f(x, y) == 1:
        pts.append([x, y])
pts = np.array(pts)

alpha_shape = alphashape.alphashape(pred, 2.)

fig, ax = plt.subplots()
ax.add_patch(PolygonPatch(alpha_shape, alpha=0.2, fc='#ddd', zorder=100))

And this is the result:

It is worth noting that even though this was a 2D problem, the same procedure could have been carried out for a classification problem in any number of dimensions.

Leave a Reply

Your email address will not be published. Required fields are marked *

54 + = 61