[Math] How does the answer to Feynman’s Restaurant Problem change if $M$ is not restricted to a single value

probability

First, the background: Feynman's restaurant problem asks how we can maximise the total rating of the meals we eat at a restaurant with $N$ items on the menu, given that we know up-front that we are going to eat $M$ meals there.

I have transcribed the problem and answer:

Assume that a restaurant has $N$ dishes on its menu that are rated
from worst to best, $1$ to $N$ (according to your personal
preferences). You, however, don't know the ratings of the dishes, and
when you try a new dish, all you learn is whether it is the best
(highest rated) dish you have tried so far, or not. Each time you eat
a meal at the restaurant you either order a new dish or you order the
best dish you have tried so far. Your goal is to maximize the average
total ratings of the dishes you eat in $M$ meals (where $M \leq N$).

The average total ratings in a sequence of meals that includes $n$
"new" dishes and $b$ "best so far" dishes can be no higher than the
average total ratings in the sequence having all $n$ "new" dishes
followed by all $b$ "best so far" dishes. Thus a successful strategy
requires you to order some number of new dishes and thereafter only
order the best dish so far. The problem then reduces to the following:

Given $N$ (dishes on the menu) and $M\leq N$ (meals to be eaten at
the restaurant), how many new dishes $D$ should you try before
switching to ordering the best of them for all the remaining $(M–D)$
meals, in order to maximize the average total ratings of the dishes
consumed?

Answer : $D = \sqrt{2(M+1)} – 1$

So if we are visiting a restaurant with $20$ dishes $7$ times, we should pick different dishes for the first $3$ trips, then have the best of those the next $4$ times.

However, as far as practical application goes, this answer is the answer to a question that doesn't match reality – rarely do we know exactly how many times we're going to eat at a restaurant!

Suppose instead of having $M$ equal some fixed value, we instead have $M$ distributed according to some probability distribution $P$. For example, suppose we know we are going to visit a restaurant with $20$ dishes on some number of occasions uniformly distributed over $5..10$. What now should be our value for $D$? What about if $P$ is a less simple distribution?

* edit to add *

Is it as simple as $\sum_m{P(m) D(m)}$ (which is I think $E(D)$?) where $D(m) = \sqrt{2(m+1)} – 1$ as above? Assuming that $P$ can't change over the course of the exercise, which as mjqxxxx points out would be quite possible in reality…

Best Answer

I made this problem up (and its solution), so I guess I should answer the question, "Is it as simple as...". Yes, it's that simple, because (when n and m are unequal) eating exactly n meals and eating exactly m meals are independent events.

Please note, however, that this problem was based on a description of Feynman’s original that turned out to be inaccurate. In Feynman's version meals are rated on a continuum in the range [0,1], rather than being ranked 1 thru N, and the information available to the diner each night includes the rating P of the best meal tried so far, the number of nights n remaining to dine, and a threshold value Pn (that depends only on n) such that if P>Pn the diner repeats the meal rated P (“the best so far”), or otherwise s/he tries something new. Feynman found that to maximize the sum of the diner's meal ratings one should choose Pn = Sqrt[n]/(1+Sqrt[n]). For more detailed information see this page.

Michael A. Gottlieb

Editor, The Feynman Lectures on Physics New Millennium Edition

Related Question