[Math] Baseball Roster Optimization

combinatoricsoptimizationpermutations

I'm trying to programmatically optimize a Fantasy Baseball Roster that requires a fixed number of players at position (2 Catchers, 5 Outfielders, etc.) and has a salary constraint (total draft price cannot exceed n dollars). The success of the roster is based on the combined team production in batting (or pitching) categories.

I have projections of each player's estimated statistical contributions – from which, I'm able to determine (using Standard Scoring) their approximate value. For this exercise, we can assume that I have priced each player appropriately.

For this particular league, there are six batting categories (HR, RBI, Stolen Bases, etc)… and the projections I have allow me determine how much each player will contribute within each statistical category.

Given 154 draftable players, and requiring 14 batters per team – gives an astounding 26,327,386,978,706,200,000 number of different combinations. Obviously, I don't want to attempt a brute-force method of testing each possible combination of players to determine an optimal roster. (I'd have a lot of rosters that are too expensive, and a lot of rosters worth $14.00 (or less)).

Clearly, I need to be smarter about this and I'm looking for some degree of direction in order to get started.

What I've tried:

My first attempt at optimizing the roster was to select the BEST x players (where x = the number of players required at a position). Once I had those 14 players, I was MASSIVELY over the maximum salary – so I determined which category I was strongest in and reduced the best player in that category with the next-best player in the pool (replacing say Miguel Cabrera with Evan Longoria). I then re-calculate the roster value (still way over) and again, figured out the 'strongest' category and sought to replace the best player in that category with the next-best player in the un-dafted pool.

The process repeats until the sum total of the roster is just under the salary threshold. I'm MILDLY happy with the results… but wonder if there's not a better way to work through potential roster combinations in a way that:

Maximizes each constituent category (there is little value in adding Home Runs for example, if you've already got enough to win – and you're deficient in Stolen Bases). So 'leveling' out the constituent categories is important.
Keeps you as close to possible (without going over) the total roster cost.
Again – I'm looking for direction – someone more proficient in math to say "this is clearly a knapsack problem and here's how you should be thinking it through…."

I'm a programmer… not a mathematician and any assistance this group could provide would be greatly appreciated.

Regards

[Note: Moved from mathoverflow.net per community direction]

Best Answer

Offhand this sounds like an integer programming problem, but you haven't really specified what it is you're looking to optimize. That is, if we list the available players as $i=1,2,3,\ldots,N(=154)$, let $p_i$ be player $i$'s draft price, and let $x_i=1$ or $0$ indicate whether you draft player $i$ or not, then two constraints are

$$\sum_i x_i=14\quad\text{and}\quad\sum_ip_ix_i\le C$$

where $C$ is the salary cap. There will be additional constraints corresponding to the requirements on the maximum or minimum number of catchers, outfielders, etc. But the question is, how do you describe mathematically the players' combined "contributions"?

For example, suppose we limit things to two categories, say home runs and stolen bases, and suppose player $i$'s value in each category is $h_i$ and $s_i$, respectively. We might define

$$H(x_1,\ldots,x_N)=\sum_i h_ix_i\quad\text{and}\quad S(x_1,\ldots,x_N)=\sum_is_ix_i$$

But we would still need to specify an objective function $W(H,S)$ to optimize, and it sounds like that function, whatever it is, is going to be nonlinear (since you indicated there are diminishing returns for each category). In short, you have not yet posed your question here in a way that can be answered mathematically.

Related Question