Y. Bartal has studied a related problem of embedding metric spaces to hierarchically separated trees. With $1 < \mu$ being a fixed real number, a $\mu$-HST is equivalent to the set of corners of a rectangle whose edges are of length $c, c\mu^{-1}, c\mu^{-2}, \dots, c\mu^{1-D}$ with the $l_\infty$-metric. That is, if you think of the space as the set of bit sequences of length $D$, the distance of two sequences is $c\mu^{-j}$ if they first differ in bit $j$.
Now in your question you didn't ask for $\infty$-metric, but for this set of points, it doesn't really matter which metric you take because the distortion between this and either the $l_1$ or $l_2$ metric is bounded by a constant (if you fix $\mu$ but $D$ can vary).
(This metric can be considered a graph metric on a special tree, that is, one where the points are some (but not necessarily all) vertexes of a tree graph with weighted edges, and the distance is the shortest path. This is where "tree" in the name comes from.)
Now Bartal's result in [1] basically says that you can embed any metric space randomly to a $\mu$-HST with distortion at most $\mu(2\ln n+2)(1+\log_\mu n)$ where $n$ is the number of points. (Also, this embedding can be computed with a randomized polynomial algorithm.)
For this, you need to know what a distortion $\alpha$ random embedding $f$ means. It means that for any two points $d(x,y) < d(f(x),f(y))$ is always true and that the expected value of $d(f(x),f(y))$ is at most $\alpha d(x,y)$. For many applications, this is just as good as a deterministic embedding with low distortion. In fact, you can make a deterministic embedding with low distortion from it by imagining the metric $d^* $ on the original space where $d^*(x,y) = E(d(f(x), f(y))$, but this notion isn't too useful because the resulting metric does not have nice properties anymore (it's not HST). Indeed, I believe the randomness is essential here as I seem to remember reading somewhere that you can't embed a cycle graph (with equal edge weights) to a tree graph with low distortion.
Anyway, this might not really answer your question. Firstly, $D$ (the number of dimensions of the rectangle) is not determined in advance, but that's not a real problem because if you have $D$ significantly different distances in the input metric then you need at least that large a $D$ for any embedding; and with this embedding you don't need a $D$ larger than $\log_\mu (\Delta/\delta)$ where $\Delta$ and $\delta$ are the largest and smallest distances in the input. The real problem is that you seem to want to know a deterministic embedding, and the highest possible distortion necessary in that case, which this really doesn't tell. For example, a cycle graph with an even number $n$ of vertexes can of course be embedded isometrically to a cube of dimension $n/2$.
Survey [2] has some more references.
[1]: Yair Bartal, On Approximating Arbitrary Metrics by Tree Metrics. Annual ACM Symposium on Foundations of Computer Science, 37 (1996), 184–193.
[2]: Piotr Indyk, Jiří Matoušek, Low-distortion embeddings of finite metric spaces. Chapter 8 in Handbook of Discrete and Computational Geometry, ed. Jacob E. Goodman and Joseph O'Rourke, CRC Press, 2004.
Best Answer
Answer to Q1: All of the 261.
I looked at this question because of a video of Matt Parker and wrote an algorithm to find solutions. See here for an example of how a solution would look like. I dumped all solutions on github. For some cases I list multiple solutions. The files start with a number followed by an underscore and the number is between 0 and 260 corresponding to one of the unfoldings in this list.
(the last rows are then the coordinates of the copies of the unfolding, whatever sticks out of the box fits in another one..)
Here is an example of one of the two one that fit in a 4x4x2 box ( in an exploded view, you can shift them together and they neatly fit into a 4x4x2 box, which can then be stacked to obtain a tiling.:
Answer to Q2: One way is to use integer programming.
The basic idea is to take a box with volume divisible by 8 as a fundamental domain and to cover it as much as possible with non-overlapping copies of an unfolding and make sure that the stuff that sticks out of the bottom actually fits into gaps at the top and so on.
Here's a two-dimensional example: The tile:
A 5x6 box filled with copies of that tile:
A tiling produced from this:
This can be formulated a as an integer program and it turns out that those 261 unfoldings all have feasible solution for some smallish box (for most 4x4x4 is enough).
Setting up the integer program is just a few lines of sage:
Here we optimize for having as much voxels filled in the box as possible, but only so that the solutions look nicer, it would give valid tilings even without setting that objective.
A similar integer programming approach could be used to prove that a certain polyomino is not a space filler: We can check for larger and larger boxes that if it is not possible to cover them completely with non-overlapping copies of the original polyomino, but this wasn't necessary.
Update: I added 3d-plots of all solutions to 3d-renderings of all 261 unfoldings page. When you click on the number of each one you get a 3d-rending of copies of that unfolding mostly in a box (drawn in grey), which then can be used to tile the plane.
Note this is a different order than the one above (and it starts indexing with $1$.
#213 is the Dalí-unfolding, needs 8 pieces in the prototile.
#3 is the only one that only needs 3 pieces (in a 4x3x2 box) (what a coincidence..)
#72 and #159 are the ones that perfectly fit in a 4x4x2 box.
#139 needs a long 8x2x2 box.