Your suspicions are correct. Let's show that a permutation is lucky iff it avoids the pattern 312.
For an injection $W$ from $\{1,\ldots,k\}$ to $\{n-k+1,\ldots,n\}$, let $N(W)$ denote the result of removing $W(1)$ and increasing all elements below $W(1)$ by $1$. For example, $N(32514) = N(3524)$.
Lemma 1.
If $W$ avoids $312$ then so does $N(W)$.
Proof.
Clear since the relative order of elements in $N(W)$ is the same as the corresponding elements in $W$.
Lemma 2.
Suppose $W$ avoids $312$. After running one round of the algorithm, $S(1)$ contains the index of the minimal element in $W$, and $W \circ S = N(W)$.
Proof.
The lemma is clear if $W(1)$ is the minimal element. Otherwise, since $W$ avoids $312$, all elements below $W(1)$ form a decreasing sequence $W(1) = W(i_1) > \cdots > W(i_k)$. The algorithm puts the minimal one $W(i_k)$ at $S(1)$, and puts $W(i_t)$ at $W(i_{t+1})$.
Theorem 1.
If $W$ avoids $312$ then $W$ is lucky.
Proof.
Apply Lemma 2 repeatedly. Lemma 1 ensures that the injection always avoids $312$.
For the other direction, we need to be slightly more careful.
Lemma 3.
If $W$ contains a pattern $312$ in which $3$ doesn't correspond to $W(1)$ then $N(W)$ contains a pattern $312$.
Proof.
The pattern survives in $N(W)$ since all relative positions are maintained.
Lemma 4.
If $W$ doesn't contain a pattern $312$ in which $3$ corresponds to $W(1)$ and $1$ corresponds to the minimum of $W$ then after running one round of the algorithm, $S(1)$ contains the index of the minimal element, and $W \circ S = N(W)$.
Proof.
Follows directly from the proof of Lemma 2.
Thus we should expect trouble if there are $i<j$ such that $W(1) > W(j) > W(i)$. However, if $W(i)$ is not the minimal element, the trouble won't be immediate.
List the elements which are smaller than $W(1)$ as $W(t_1),\ldots,W(t_k)$, and suppose that $W(t_r) < W(t_{r+1}) > \cdots > W(t_k)$. One round of the algorithm puts $t_r$ at the place of $t_{r+1}$. The following rounds maintain the relative order of the elements in positions $t_{r+1},\ldots,t_k$, and so in the final result, the position which should have contained $t_{r+1}$ will contain $t_r$.
Example: $W = 632541$. The final result is $S = 652134$, which corresponds to the permutation $143625$. We can see that $S(1)$ is correct since $W$ satisfies the conditions of Lemma 4. We have $t_r = 3$ and $W(t_r) = 2, W(t_{r+1}) = 5$. We see that indeed $W(S(5)) = 2$ instead of $5$.
Theorem 2.
If $W$ contains $312$ then $W$ is unlucky.
Proof.
Along the lines of the discussion above.
Generating $k$-tuples from a multiset
The Question expresses concern a recursive generation of these combinatorial objects must suffer from "space requirements [that] grow very rapidly" with the size $|A|$ of the underlying set $A$ in which multiset $M$ takes its repetitions. Let's first outline a method for generating these with reasonable space complexity.
Apart from some fixed amount of bookkeeping (pointers, etc.), the important structures are a working copy of the multiset (from which items may be removed and replaced) and a working copy of a $k$-tuple (modified successively to produce all $k$-permutations in ascending lexicographic order).
If the maximum repetition $\max \mu(A)$ is less than $2^b$, then the multiset $M$ can be represented by an unsigned integer array $R=[r_1,\ldots,r_a]$ of length $a=|A|$, whose entries (of bitsize $b$) correspond to repetition counts of the elements of $A$. We may WLOG take $A=\{1,\ldots,a\}$, so the index of the (1-based) array matches the underlying element of $A$.
The $k$-permutations will be represented by successively populating a $k$-tuple $T = (i_1,\ldots,i_k)$ whose entries are indexes denoting elements of $A$.
As items from the multiset are used to form a $k$-tuple, we will "remove" them from $M$ by decrementing the respective entry of $R$. Similarly if an item in the $k$-tuple is "replaced", it goes back in the pool and the entry is incremented. Thus an updated status of the available pool of items can be maintained in array $R$.
The algorithm can be described as a recursion-with-backtracking or as an equivalent iterative procedure. The recursive description is likely more concise.
To populate the $k$-tuple, choose the leftmost entry of $T$ successively from distinct least to greatest possible from the available pool $R$ and remove that item from the pool. Populate the rest of the entries of $T$ by generating the possible $k-1$-tuples from the remaining items in the pool. On backtracking (to the next larger choice of leftmost entry), update the pool with replacement of the previously chosen item and the removal of the new choice.
Example: For the case $k=3$ and $M = \{1,1,2,3\}$, we can visualize the search tree as follows:
1st entry 2nd entry 3rd entry k-tuple
1 ---------+-------- 1 ---------+-------- 2 (1,1,2)
| |
| +-------- 3 (1,1,3)
|
+-------- 2 ---------+-------- 1 (1,2,1)
| |
| +-------- 3 (1,2,3)
|
+-------- 3 ---------+-------- 1 (1,3,1)
|
+-------- 2 (1,3,2)
2 ---------+-------- 1 ---------+-------- 1 (2,1,1)
| |
| +-------- 3 (2,1,3)
|
+-------- 3 ---------+-------- 1 (2,3,1)
3 ---------+-------- 1 ---------+-------- 1 (3,1,1)
| |
| +-------- 2 (3,1,2)
|
+-------- 2 ---------+-------- 1 (3,2,1)
Thus all twelve $3$-permutations of multiset $\{1,1,2,3\}$ are found in ascending lexicographical order.
Prolog Implementation
Apart from the length/2
predicate, almost universally provided either as a built-in or library predicate, the following is a self-contained implementation of the recursion-with-backtracking algorithm described above:
/*
kPermute(++K,++Multiples,?Ktuple)
Takes instantiated integer K > 0
and list of multiplicities Multiples
of length n denoting repeated items
in the underlying set A = {1,..,n}.
Generates by backtracking lists Ktuple
of length K whose entries lie in A.
*/
kPermute(K,Multiples,Ktuple) :-
length(Ktuple,K),
kTuple(Multiples,Ktuple).
/* kTuple(++Multiples,+Ktuple) */
kTuple(_,[ ]).
kTuple(M,[K|Tuple]) :-
removeItem(M,K,R,1),
kTuple(R,Tuple).
/* removeItem(++List,-Item,-Remnant,++Index) */
removeItem([H|T],I,[G|T],I) :-
H > 0,
G is H-1.
removeItem([H|T],I,[H|R],J) :-
K is J+1,
removeItem(T,I,R,K).
Best Answer
Consider the Cayley graph generated by adjacent swaps $T_i = (i, i+1)$. These swaps generate the symmetric group and their Cayley graph is layered by the number in inversions. To walk over the first $k$ layers you could use breadth-first search, but that uses a lot of memory. For practical implementation you probably want depth-first search, and to avoid duplicates you want to limit each vertex to one incoming edge in any deterministic way. (E.g. you could look at the lowest $i$ such that $T_i$ gives an in-edge).