[Math] Coronavirus growth rate and its (possibly spurious) resemblance to the vapor pressure model

curvesmathematical modelingnumerical methodsphysicsstatistics

The objective is the model the growth rate of the Coronavirus using avaibale data. As opposed to the standard epidemiology models such as SIR and SEIR, I tried to model a direct relation between the number of infected or deaths as a function of time so as to capture the early days trends. I collected the latest data on the coronavirus from Johns Hopkins University as shown and fitted different curves to this data to model the relationship between the number of confirmed patients $P$ who are/were infected as function of time $T$ taking 20-Jan-20 as day 1.

enter image description here

enter image description here

The curve fitting software I used has well known models form different branches of science as well as we could build our own custom models which I did as shown in the image below. The score against the name of a model gives how well a model fits this data. The higher the score, the better the fit and the maximum possible score is 1000. Although we have data only for 18 days (as of 7-Feb 2 AM GMT) one model always kept appearing at top as the best fit and this was the vapor pressure model. After checking for various boundary conditions, I rejected many of models but I could not find any immediately reason to reject the vapor pressure model. Similarly when I modeled the number of reported deaths against time, the vapor pressure model once again gave the best fit which for which I could not find any obvious reasons to reject.

enter image description here

So I did some research on the vapor pressure model.

Basic concept of vapor pressure Because the molecules of a liquid $L$ are in constant motion and possess kinetic energies, at any moment some fraction of them has enough energy to escape from the surface of the liquid to enter the gas phase. This process, called evaporation, generates a vapor pressure $P_L$ above the liquid. Molecules in the gas phase can collide with the liquid surface and reenter the liquid via condensation. Eventually, a steady state is reached in which the number of molecules evaporating and condensing per unit time is the same, and the system is in a state of dynamic equilibrium. Under these conditions, a liquid exhibits a characteristic equilibrium vapor pressure that depends only on the temperature $T_L$. Volatile liquids are liquids with high vapor pressures, which tend to evaporate readily from an open container; nonvolatile liquids have low vapor pressures. When the vapor pressure equals the external pressure, bubbles of vapor form within the liquid, and it boils. We can express the nonlinear relationship between vapor pressure and temperature as an almost linear relationship using the Antoine equation.

$$
P_L = exp\Big(a + \frac{b}{T_L} + c\log T_L\Big)
$$

Next I did some research what is known about how the coronavirus spreads and if it is related to liquids. Here is what I found.

How coronavirus spreads: When an infected person coughs or sneezes, they shed droplets of saliva, mucus, or other bodily fluids. If any of those droplets fall on you—or if you touch them and then, say, touch your face—you can become infected as well. Hospital for communicable diseases define exposure as being within six feet of an infected person for 10 minutes or longer. Time and distance matters.

The coronavirus spreads when it escapes from an infected person through microscopic droplets of liquid carrying the virus through air. I wonder this has anything to do with why the vapor pressure model keeps coming on the top as the best fit even though there is no apparent pressure or temperature as in the vapor pressure equation and I cannot see how they could be elated. May be all this just a mere coincidence.

As a bad scientist but a concerned human, I thought I must report this
observation in case there is anything worth in it.

Question: My objective was mainly for reporting this given the seriousness of the situation. But since the community rules mandates a question, it I will ask one. Given this limited data what can we infer about the corona virus and how can we reject the vapor pressure model as a mere coincidence.

Note 1: I will be posting this in the Physics community where it is more relevant. But with 638 deaths in the last two weeks, I have posted it in MSE just in case someone else might find it useful.

Note 2: I am well aware of spurious correlation. But with only 3 weeks of data, we many not be able to detect a different trend this early. Hence I am reporting the best fit regardless.

Best Answer

What you have here is a severe case of overfitting. You only have 18 data points and you test a large variety of different models each of which has several free parameters. One of these models with optimized parameters will fit you data very well, regardless of what the data looks like.

The number of death is growing and there are various medical models telling you what a typical infectious disease spread looks like. Trying to do better than this with the little data available does not lead to useful new insights. In spite of the excellent fit for the data observed so far, there is no reason to believe your model is better at predicting the future than any of the models public health researchers usually use for these kind of situations.

Related Question