Solved – Modeling Time Series Sensor Data with Machine Learning Techniques

machine learningmodelingtime series

I work on air quality sensors, several of which are electro-chemical gas sensors. By way of background, these sensors are stimulated by a potentiostat circuit which applies a bias voltage and then measures the current which flows through the sensor (typically on the order of nano-Amps). The amount of current that flows through the sensor is related to the concentration of a target gas to which the sensor exposed. The current is also related to the pressure, relative humidity, temperature, and exposure to cross-sensitive gases, wherein resides the bane of my existence.

We've traditionally used a data modeling approach to interpret the current measured from the sensor as a concentration of the target gas, based on the sensor manufacturer's recommendations. We do so by measuring the response to clean air, and over a range of temperatures, and then using that characterization to interpret deflection from the characterized baseline response as attributable to target gas exposure.

We don't have the means to really evaluate the quality of that model, because we have neither a reference instrument, nor the means to expose the sensor to controlled concentrations of gas, but we are able to expose the sensors to target gas in order to confirm they appreciably respond to the target gas.

The challenge I'm experiencing is that the data model parameterized by the aforementioned characterization, over longer periods of time (i.e. a week), and subject to clean air under naturally occurring variation in temperature, relative humidity, and pressure, yields a span of variation in interpreted concentration that is unreasonably large. It's not noisy, but rather it drifts. That leads me to believe that the data model is sorely lacking.

That has lead me to think that an algorithmic (machine learning) approach might yield better results. Given that I have one-minute resolution data for temperature, relative humidity, pressure, and sensor current (all real-valued) under clean air conditions, what tools would be best suited to modeling sensor current as a function of temperature, relative humidity, and pressure? The thing I'm most concerned about is that we can't practically create conditions that represent a reasonable cross-section of the input space.

I would then use the traditional data model to interpret the deflection from the predicted baseline to estimate the gas concentration.

One side note is that temperature and relative-humidity are physically correlated, though I could mathematically back out absolute humidity from temperature, relative humidity, and pressure, which I think would de-correlate it.

Update / Clarification

In case this wasn't clear from the above, the goal is to be able to estimate the baseline voltage produced by a sensor in a clean air environment under varying pressure, humidity, and temperature conditions – as a means to using the deflection from that predicted baseline as the signal of interest in calculating concentration of target species gas. So basically I'm investigating alternative approaches to what is usually called zero-calibration in the instrumentation domain.

If I had truth data about the target species, it seems to me that I might be able to skip the business about deflection from a predicted baseline and estimate concentration directly from the voltage, temperature, humidity, and pressure time vectors.

Best Answer

Edit and TL;DR version: this could be treated as a mediation/moderator analysis problem, but that would still require an independant measurement to calibrate the device.

This sounds like a mediation/moderation analysis problem, not machine learning.

Let M1 be a model of the voltage under clean air conditions as a function of p, v and humidity. The deviance from M1 per se would not give you a concentration estimate. It would give you a probability that the gas is present and interfering with the sensor. A certain deviance (residual value) will not indicate the same concentration of the target gas for every p, v and humidity values because the way the gas affects the voltage varies with the other parameters. Similarly, going from let's say 2mV to 4mV of deviance does not necessarily imply that the concentration doubled - the scale might be non-linear and that scale itself might be influenced by your other variables. In other words, it's a good idea to look at the difference between the measured value and the value predicted by M1, but converting the residuals in gas concentration is not a 1:1 thing.

Another way to look at it which is more akin to the actual situation is to see the concentration as the independant variable, the sensor voltage as the dependant variable and p, t and hum as moderator variables. You'd need to induce different concentrations of gas and take measurements at various t, p and hum values for that to work though.

Here are some ressources:

This makes for a fun, almost philosophical problem to look at during the xmas vacation btw, so if you have a real or simulated dataset that you'd like to add to your question I'll take a look at it.

Epilogue

I showed this post and the data to a measurement specialist and an engineer who is also a specialist in measurement theory, and both said "get the suitcase with the calibration equipment". There's just no way around it.

Related Question