Using R to visualize a Symbolic Regression model

In this article, we are going to show how a symbolic regression model can be visualized using the R programming language. The model will be generated using the TuringBot symbolic regression software, and we are going to use the ggplot2 library [1] for the visualization.

The dataset that we are going to use consists of the closing prices for the S&P 500 index in the last year, downloaded from Yahoo Finance [2]. The CSV file, which also contains additional columns like open, high, low, and volume, can be found here: spx.csv

Symbolic regression modeling

After opening TuringBot and selecting this file from the menu on the upper left of the interface, we select “Row number” as the input variable and “Close” as the target variable. This way, our model will find the close price as a function of the index of the trading day (1, 2, 3, etc). We will also use a randomly selected 50:50 train/test split to make our model more robust, and “mean relative error” as the optimization metric because we are more interested in the shape of the model than in specific values.

This is what the interface will look like:

Clicking on the play button at the top, the optimization is started, using all the CPU cores in the computer for greater performance. The models encountered so far are seen in the “Solutions” box.

Selecting the best formula

After letting the optimization run for a few minutes, we can click on the “Show cross-validation error” box on the upper right of the interface to see the out-of-sample performance of each model, and use this information to select the best one, which in this case turned out to be a combination of cosines and multiplications:

Visualizing with R and ggplot2

Now that we have the model, we are going to visualize it using ggplot2. The following script loads the input CSV file and plots it along with the model that we just selected:

library(ggplot2)

data <- read.csv("spx.csv")
data$idx <- as.numeric(row.names(data))
print(data)

eq = function(row){2966.96+(2.98602*(-55.4604+row)*cos(0.0397268*(row+8.34129*cos(-0.0819996*row))-1.16301*cos(-0.0358919*row)))}
p <- ggplot(data, aes(x=idx, y=Close)) + geom_point()
#p <- ggplot() + geom_line(aes(x=idx, y=Close), data=data)
p + stat_function(fun=eq, color='blue')
#png("test.png")
#print(p)

And this is the final result:

Symbolic regression R model.

This demonstrates the power and simplicity of symbolic regression models: we have managed to readily implement and visualize a deep learning model generated using TuringBot into R, something that would be much harder if the model was a black box like a neural network or a random forest.

References

[1] ggplot2: https://ggplot2.tidyverse.org/

[2] Yahoo Finance quotes for the S&P 500: https://finance.yahoo.com/quote/%5EGSPC?p=^GSPC&.tsrc=fin-srch

About TuringBot

TuringBot is a desktop software for Symbolic Regression. By feeding your data in .TXT or .CSV format into the program, you can immediately start searching for mathematical formulas that connect the variables. If you want to learn more about what TuringBot can offer you, please visit our homepage.