[Math] Sigmoid Function Question

exponential functionfunctions

I've been trying for well over a week to try to understand how to use a simple sigmoid
or logistic function works.

Specifically I'm trying to understand how to build proper polynomia parameters for the function
to work properly.

Ive literally gone through dozens of web pages and downloadable opdf files loooking for good descriptions
and examples of how to use this.

My main probloem so far has been that literally every single online resource Ive found always appears
to leave out critically important descriptions of parts of the function, assuming that the reader will
just know everything that is being talked about, so Im left with a large collection of incomplete
descriptions – very frustrating!

Ive found some examples that appear to be better than others, so for example in one description I see
an example of this sigmoid function ;

$$w = \frac{w_{\text{max}}}{1 + e ^{-k (t – t_m)}}$$

( from the online file ; "a flexible sigmoid function of determinate growth" )

here, I understand that the $-k$ represents a value indicating how steep the slope of growth is
and that $t = \text{time}$, and that $t_m$ represents the max ceiling of time, so Im guessing that here
we'd subtract a current time from a max time??? That seems a bit confusing also.

Then I found a better example at this website ; http://www.cs.xu.edu/math/math120/01f/

The file here is called "logistic.pdf"

In this example they show the function of ;

$$y = \frac{C}{1 + Ae ^{-Bx}}$$

$A =$ # of times initial population must grow to reach ($C$)
Also $A$ describes the relation betwen initial and limiting output values

$B =$ Here in the example the definition is left ambiguous through out the document all it really says is that this value increases / decreases
when the function is positive / negative

The document says ($B$) can be derived with the inflection point coordinates
but it doesnt really say exactly why or how, it gives only a very vague
generic description on page 6 where it says $\frac{\ln 12.8}{0.0266} = 95.8$

But on page 3 it says
"The parameter $B$ is much harder to interpret exactly. We will be content to
simply mention that"

On page 4 it says "It turns out that $A = 12.8, B = 0.0266, C = 11.5$ are parameter values that yield a logistic function with a good fit to this data:"

This seems completely confusing, the value for $B$ seems arbitrarily assigned and they simply say that this value "turns out" to be a good paramter fit…. I totally dont understand how or why this value of $0.0266$
has been chosen and I dont understand what it means that this value
"turns out" to be a good fit??!!

$X =$ some input

$C =$ A ceiling or limit but is representing long run behaviour of the function

So, if possible I would really honestly appreciate some basic answers to these questions ;

What IS ($B$) actually representing here?

And how, why, and where did they assign it the value of $0.0266$ ???

And lastly, how can we determine what parameters to use for a sigmoid function?

I can understand that the first value passed to $e$, like $-k$ can represent the growth slope angle;

$$w = \frac{w_{\text{max}}}{1 + e ^{-k (t – t_m)}}$$

But how do we determine what other parameters can be used and is there any online resource
or simple method available that can describe how we choose the other parameters we want
and then how do we determine the proper syntax of these parameters for the function?

Thanks for any meaningful feedback you can provide!

🙂

Best Answer

It's a shame nobody ever answered this question years ago when asked. As an exercise for myself I will attempt it.

There's a useful answer with some python here: https://stackoverflow.com/a/43213692/1335793 That shows some varied parameterisations to shift or scale the curve.

In your examples, these two are equivalent:

$w = w.max / 1 + e ^-k (t - tm)$

$y = C / 1 + Ae ^-Bx$

In the first one, the sigmoid is applied to growth over time, so the x axis is assumed to be time. t-tm is the difference between the start time and some point later in time, which is really the same as an x coordinate. So t-tm is the same as x in the second formula.

What IS (B) actually representing here?

B is positive or negative growth rate. If you make it negative, the curve will be a mirrored, i.e. it will start high and get lower over time. If it is positive the curve will start low and grow higher over time. In the paper you linked http://www.cs.xu.edu/math/math120/01f/logistic.pdf the first page shows a positive growth curve, the second page a negative curve. There is a relationship between the growth rate and the inflection point. If you modify the value of the growth rate, it changes the steepness of the slope in the middle of the curve. The paper explains that if you know the inflection point you can then back-calculate the growth rate.

And how, why, and where did they assign it the value of 0.0266 ???

This is a bit frustrating to learn about, as it seems arbitrary and in fact it is to some extent. They started with the data points, and then recognized that they kind-of followed a logistic distribution, and then attempted to "fit" the right function to the data by modifying the parameters of the logistic function. This is usually done with computational methods like ordinary-least-squares or gradient-descent, which are akin to starting with a random guess, then changing it slightly until the curve is as close to each of the data points as is possible within the constraints of the chosen function. It could just as well be done by trial-and-error, mucking around with the parameters until it looks right. If that doesn't feel very elegant then you would be right, it is a practical brute-force approach to applied mathematics.

And lastly, how can we determine what parameters to use for a sigmoid function?

The convention for functions that describe curves in two dimensions is to use some form of

$y = f(x)$

where the $y$ coordinate is calculated by applying some function to the $x$ coordinate. This assumes you know what all the $x$ coordinates are, but that is not a problem for idealized functions like the logistic (sigmoid distribution). In the growth-of-population example, you know that time is continuous and the unknown is "what will the population be at any point in time?". Using the known (time) value you can then try to predict the unknown (population) value.

Choosing the syntax to use doesn't really matter that much, although it does help to use conventions, or for specific examples some variables that have meaning in the context. In the generic formula

$y = C / 1 + Ae ^-Bx$

$y$ will be the population value, $C$ will be the highest possible population value, and $e$ is Euler's number.

The generic logistic model is symmetrical around 0 but can be shifted left-right (x axis offset), up-down (y-axis offset), scaled (amplitude) or "stretched", changing the slope of the growth. You need to add additional parameters to the model to control those aspects, which makes the model describe your observed data. The approach of starting with a generic model and customizing it to data needs to balance goodness of fit with complexity (not too many parameters). The process of model selection within the field of statistical modelling is dedicated to doing just that.

Related Question