Solved – Intuitive explanation of logloss

interpretationintuitionloss-functions

In several kaggle competitions the scoring was based on "logloss". This relates to classification error.

Here is a technical answer but I am looking for an intuitive answer. I really liked the answers to this question about Mahalanobis distance, but PCA is not logloss.

I can use the value that my classification software puts out, but I don't really understand it. Why do we use it instead of true/false positive/negative rates? Can you help me so that I can explain this to my grandmother or a newbie in the field?

I also like and agree with the quote:

you do not really understand something unless you can explain it to
your grandmother
— Albert Einstein

I tried answering this on my own before posting here.

Links that I did not find intuitive or really helpful include:

These are informative, and accurate. They are meant for a technical audience. They do not draw a simple picture, or give a simple and accessible examples. They are not written for my grandmother.

Best Answer

Logloss is the logarithm of the product of all probabilities. Suppose Alice predicted:

with probability 0.2, John will kill Jack
with probability 0.001, Mary will marry John
with probability 0.01, Bill is a murderer.

It turned out that Mary did not marry John, Bill is not a murderer, but John killed Jack. The product of the probabilities, according to Alice, is 0.2*0.999*0.99=0.197802

Bob predicted:

with probability 0.5, John will kill Jack
with probability 0.5, Mary will marry John
with probability 0.5, Bill is a murderer.

The product is 0.5*0.5*0.5=0.125.

Alice is better predictor than Bob.

Related Solutions

Solved – Intuitive explanation of stationarity

First of all, it is important to note that stationarity is a property of a process, not of a time series. You consider the ensemble of all time series generated by a process. If the statistical properties¹ of this ensemble (mean, variance, …) are constant over time, the process is called stationary. Strictly speaking, it is impossible to say whether a given time series was generated by a stationary process (however, with some assumptions, we can take a good guess).

More intuitively, stationarity means that there are no distinguished points in time for your process (influencing the statistical properties of your observation). Whether this applies to a given process depends crucially on what you consider as fixed or variable for your process, i.e., what is contained in your ensemble.

A typical cause of non-stationarity are time-dependent parameters – which allow to distinguish time points by the values of the parameters. Another cause are fixed initial conditions.

Consider the following examples:

The noise reaching my house from a single car passing at a given time is not a stationary process. E.g., the average amplitude² is highest when the car is directly next to my house.
The noise reaching my house from street traffic in general is a stationary process, if we ignore the time dependency of the traffic intensity (e.g., less traffic at night or on weekends). There are no distinguished points in time anymore. While there may be strong fluctuations of individual time series, these vanish when I consider the ensemble of all realisations of the process.
If I we include known impacts on traffic intensity, e.g., that there is less traffic at night, the process is non-stationary again: The average amplitude² varies with a daily rhythm. Every point in time is distinguished by the time of the day.
The position of a single peppercorn in a pot of boiling water is a stationary process (ignoring the loss of water due to evaporation). There are no distinguished points in time.
The position of a single peppercorn in a pot of boiling water dropped in the exact middle at $t=0$ is not a stationary process, as $t=0$ is a distinguished point in time. The average position of the peppercorn is always in the middle (assuming a symmetric pot without distinguished directions), but at $t=ε$ (with $ε$ small), we can be sure that the peppercorn is somewhere near the middle for every realisation of the process, while at a later time, it can also be closer to the border of the pot.

So, the distribution of positions changes over time. To give a specific example, the standard deviation grows. The distribution quickly converges to the respective distributions of the previous example and if we only take a look at this process for $t>T$ with a sufficiently high $T$, we can neglect the non-stationarity and approximate it as a stationary process for all purposes – the impact of the initial condition has faded away.

^{¹ For practical purposes, this is sometimes reduced to the mean and the variance (weak stationarity), but I do not consider this helpful to understand the concept. Just ignore weak stationarity until you understood stationarity.

² Which is the mean of the volume, but the standard deviation of the actual sound signal (do not worry too much about this here).}

Unit Root – An Intuitive Explanation and Its Relevance

He had just come to the bridge; and not looking where he was going, he tripped over something, and the fir-cone jerked out of his paw into the river.

"Bother," said Pooh, as it floated slowly under the bridge, and he went back to get another fir-cone which had a rhyme to it. But then he thought that he would just look at the river instead, because it was a peaceful sort of day, so he lay down and looked at it, and it slipped slowly away beneath him . . . and suddenly, there was his fir-cone slipping away too.

"That's funny," said Pooh. "I dropped it on the other side," said Pooh, "and it came out on this side! I wonder if it would do it again?"

A.A. Milne, The House at Pooh Corner (Chapter VI. In which Pooh invents a new game and eeyore joins in.)

Here is a picture of the flow along the surface of the water:

Pooh sticks 1

The arrows show the direction of flow and are connected by streamlines. A fir cone will tend to follow the streamline in which it falls. But it doesn't always do it the same way each time, even when it's dropped in the same place in the stream: random variations along its path, caused by turbulence in the water, wind, and other whims of nature kick it onto neighboring stream lines.

Pooh sticks 2

Here, the fir cone was dropped near the upper right corner. It more or less followed the stream lines--which converge and flow away down and to the left--but it took little detours along the way.

An "autoregressive process" (AR process) is a sequence of numbers thought to behave like certain flows. The two-dimensional illustration corresponds to a process in which each number is determined by its two preceding values--plus a random "detour." The analogy is made by interpreting each successive pair in the sequence as coordinates of a point in the stream. Instant by instant, the stream's flow changes the fir cone's coordinates in the same mathematical way given by the AR process.

We can recover the original process from the flow-based picture by writing the coordinates of each point occupied by the fir cone and then erasing all but the last number in each set of coordinates.

Nature--and streams in particular--is richer and more varied than the flows corresponding to AR processes. Because each number in the sequence is assumed to depend in the same fixed way on its predecessors--apart from the random detour part--the flows that illustrate AR processes exhibit limited patterns. They can indeed seem to flow like a stream, as seen here. They can also look like the swirling around a drain. The flows can occur in reverse, seeming to gush outwards from a drain. And they can look like mouths of two streams crashing together: two sources of water flow directly at one another and then split away to the sides. But that's about it. You can't have, say, a flowing stream with eddies off to the sides. AR processes are too simple for that.

Pooh sticks 3

In this flow, the fir cone was dropped at the lower right corner and quickly carried into the eddy in the upper right, despite the slight random changes in position it underwent. But it will never quite stop moving, due to those same random movements which rescue it from oblivion. The fir cone's coordinates move around a bit--indeed, they are seen to oscillate, on the whole, around the coordinates of the center of the eddy. In the first stream flow, the coordinates progressed inevitably along the center of the stream, which quickly captured the cone and carried it away faster than its random detours could slow it down: they trend in time. By contrast, circling around an eddy exemplifies a stationary process in which the fir cone is captured; flowing away down the stream, in which the cone flows out of sight--trending--is non-stationary.

Incidentally, when the flow for an AR process moves away downstream, it also accelerates. It gets faster and faster as the cone moves along it.

The nature of an AR flow is determined by a few special, "characteristic," directions, which are usually evident in the stream diagram: streamlines seem to converge towards or come from these directions. One can always find as many characteristic directions as there are coefficients in the AR process: two in these illustrations. Associated with each characteristic direction is a number, its "root" or "eigenvalue." When the size of the number is less than unity, the flow in that characteristic direction is towards a central location. When the size of the root is greater than unity, the flow accelerates away from a central location. Movement along a characteristic direction with a unit root--one whose size is $1$--is dominated by the random forces affecting the cone. It is a "random walk." The cone can wander away slowly but without accelerating.

(Some of the figures display the values of both roots in their titles.)

Even Pooh--a bear of very little brain--would recognize that the stream will capture his fir cone only when all the flow is toward one eddy or whirlpool; otherwise, on one of those random detours the cone will eventually find itself under the influence of that part of the flow with a root greater than $1$ in magnitude, whence it will wander off downstream and be lost forever. Consequently, an AR process can be stationary if and only if all characteristic values are less than unity in size.

Economists are perhaps the greatest analysts of time series and employers of the AR process technology. Their series of data typically do not accelerate out of sight. They are concerned, therefore, only whether there is a characteristic direction whose value may be as large as $1$ in size: a "unit root." Knowing whether the data are consistent with such a flow can tell the economist much about the potential fate of his pooh stick: that is, about what will happen in the future. That's why it can be important to test for a unit root. A fine Wikipedia article explains some of the implications.

Pooh and his friends found an empirical test of stationarity:

Now one day Pooh and Piglet and Rabbit and Roo were all playing Poohsticks together. They had dropped their sticks in when Rabbit said "Go!" and then they had hurried across to the other side of the bridge, and now they were all leaning over the edge, waiting to see whose stick would come out first. But it was a long time coming, because the river was very lazy that day, and hardly seemed to mind if it didn't ever get there at all.

"I can see mine!" cried Roo. "No, I can't, it's something else. Can you see yours, Piglet? I thought I could see mine, but I couldn't. There it is! No, it isn't. Can you see yours, Pooh?"

"No," said Pooh.

"I expect my stick's stuck," said Roo. "Rabbit, my stick's stuck. Is your stick stuck, Piglet?"

"They always take longer than you think," said Rabbit.

This passage, from 1928, could be construed as the very first "Unit Roo test."

Best Answer

Related Solutions

Solved – Intuitive explanation of stationarity

Unit Root – An Intuitive Explanation and Its Relevance

Related Question