Discovering Mathematical Formulas from Sensor Data with Symbolic Regression

Sensors generate massive amounts of time series data daily. While traditional machine learning approaches can predict outcomes, they often don't explain the underlying relationships.

Symbolic regression solves this by discovering explicit mathematical formulas connecting the sensors variables. Unlike black-box models, symbolic regression searches over the space of all possible mathematical formulas to find the ones that best predict output variables. It is particularly effective at problems in a small number of dimensions, making it perfect for sensor applications.

Here we'll demonstrate how TuringBot, a desktop application for symbolic regression, makes it easy to find hidden mathematical relationships in sensor measurements.

Case Study: Temperature Sensor Calibration and Drift Compensation

Let's tackle a common scenario: compensating for drift across multiple temperature sensors.

Our dataset contains readings from three temperature sensors (T1, T2, T3) showing different values due to calibration differences and drift, plus reference measurements (Tref) from a high-precision thermometer.

Step 1: Data Import

After launching TuringBot, we pasted our sensor dataset containing a time series of sensor readings, where each row represents a reading at a different time, into the built-in spreadsheet:

TuringBot's data import interface showing temperature sensor data with T1, T2, T3 readings and Tref reference values for calibration

Step 2: Configuration Setup

We configured TuringBot to search for a formula with T1, T2, and T3 as inputs and Tref as output, using RMSE as the error metric. All available base functions were allowed for the search, such as basic arithmetic, exp(x), sin(x), and sqrt(x):

TuringBot's configuration interface for symbolic regression showing input variables T1, T2, T3 and target variable Tref with RMSE scoring

Step 3: Running the Search and Results

TuringBot's algorithm searched through possible mathematical expressions, displaying real-time results with different complexities. In less than 5 minutes, it discovered this elegant formula:

Tref = (-0.348865)*(T3-T2)+T1

With an RMSE of 0.011°C, this simple equation captured the relationship remarkably well. These are all the other formulas that were discovered:

TuringBot's solution showing the discovered formula (-0.348865)*(T3-T2)+T1 with error metrics and visualization of actual vs predicted values

Insights from the Formula

This formula reveals several key insights about our sensors:

  • T1 serves as the baseline measurement, suggesting it's the most reliable sensor overall
  • The difference between T3 and T2 provides a correction factor
  • The negative coefficient (-0.348865) indicates that when T3 reads higher than T2, a downward correction is needed

Most interestingly, the formula structure reveals that the differential between sensors T3 and T2 contains valuable information about measurement error, something that wouldn't be obvious from simple inspection of the data.

Additional Sensor Applications

While our example used temperature sensors, symbolic regression is equally applicable to other sensing domains:

  • Vibration sensors can use mathematical patterns to detect early bearing failures, with formulas linking frequency components to specific mechanical issues.
  • Pressure transducers can compensate for non-linear response curves through multi-term equations.
  • Electrochemical sensors can correct for cross-sensitivity to interfering gases by encoding these relationships in explicit mathematical terms.

Technical Advantages

The formula discovered by TuringBot offers specific technical benefits:

  1. Computational efficiency: At runtime, computing (-0.348865)*(T3-T2)+T1 requires just three arithmetic operations, making it deployable on even the most resource-constrained microcontrollers.
  2. Generalization: Unlike neural networks that can overfit their training domain, this equation captures a fundamental relationship that should hold across a wide temperature range.
  3. Memory footprint: The entire calibration algorithm requires storing just one coefficient (-0.348865) versus hundreds of weights in a comparable neural network.

Conclusion

Symbolic regression with TuringBot transforms raw sensor data into practical insights. By extracting the exact mathematical relationship Tref = (-0.348865)*(T3-T2)+T1, we've solved a real calibration problem with a formula that's immediately deployable and physically meaningful. This concrete example demonstrates how explicit mathematical formulas can be competitive against traditional predictive approaches in sensor applications.

About TuringBot

TuringBot is a powerful desktop tool for Symbolic Regression. Input your data and discover mathematical formulas that link your variables.

Ready to see what TuringBot can do? Download it for free and start exploring today. Available for Windows, macOS, and Linux.