Sensors generate massive amounts of time series data daily. While traditional machine learning approaches can predict outcomes, they often don't explain the underlying relationships.
Symbolic regression solves this by discovering explicit mathematical formulas connecting the sensors variables. Unlike black-box models, symbolic regression searches over the space of all possible mathematical formulas to find the ones that best predict output variables. It is particularly effective at problems in a small number of dimensions, making it perfect for sensor applications.
Here we'll demonstrate how TuringBot, a desktop application for symbolic regression, makes it easy to find hidden mathematical relationships in sensor measurements.
Case Study: Temperature Sensor Calibration and Drift Compensation
Let's tackle a common scenario: compensating for drift across multiple temperature sensors.
Our dataset contains readings from three temperature sensors (T1, T2, T3) showing different values due to calibration differences and drift, plus reference measurements (Tref) from a high-precision thermometer.
Step 1: Data Import
After launching TuringBot, we pasted our sensor dataset containing a time series of sensor readings, where each row represents a reading at a different time, into the built-in spreadsheet:
Step 2: Configuration Setup
We configured TuringBot to search for a formula with T1, T2, and T3 as inputs and Tref as output, using RMSE as the error metric. All available base functions were allowed for the search, such as basic arithmetic, exp(x), sin(x), and sqrt(x):
Step 3: Running the Search and Results
TuringBot's algorithm searched through possible mathematical expressions, displaying real-time results with different complexities. In less than 5 minutes, it discovered this elegant formula:
Tref = (-0.348865)*(T3-T2)+T1
With an RMSE of 0.011°C, this simple equation captured the relationship remarkably well. These are all the other formulas that were discovered:
Insights from the Formula
This formula reveals several key insights about our sensors:
- T1 serves as the baseline measurement, suggesting it's the most reliable sensor overall
- The difference between T3 and T2 provides a correction factor
- The negative coefficient (-0.348865) indicates that when T3 reads higher than T2, a downward correction is needed
Most interestingly, the formula structure reveals that the differential between sensors T3 and T2 contains valuable information about measurement error, something that wouldn't be obvious from simple inspection of the data.
Additional Sensor Applications
While our example used temperature sensors, symbolic regression is equally applicable to other sensing domains:
- Vibration sensors can use mathematical patterns to detect early bearing failures, with formulas linking frequency components to specific mechanical issues.
- Pressure transducers can compensate for non-linear response curves through multi-term equations.
- Electrochemical sensors can correct for cross-sensitivity to interfering gases by encoding these relationships in explicit mathematical terms.
Technical Advantages
The formula discovered by TuringBot offers specific technical benefits:
- Computational efficiency: At runtime, computing
(-0.348865)*(T3-T2)+T1
requires just three arithmetic operations, making it deployable on even the most resource-constrained microcontrollers. - Generalization: Unlike neural networks that can overfit their training domain, this equation captures a fundamental relationship that should hold across a wide temperature range.
- Memory footprint: The entire calibration algorithm requires storing just one coefficient (-0.348865) versus hundreds of weights in a comparable neural network.
Conclusion
Symbolic regression with TuringBot transforms raw sensor data into practical insights. By extracting the exact mathematical relationship Tref = (-0.348865)*(T3-T2)+T1
, we've solved a real calibration problem with a formula that's immediately deployable and physically meaningful. This concrete example demonstrates how explicit mathematical formulas can be competitive against traditional predictive approaches in sensor applications.