Split interval into random regions

algorithmsrandom

I'm having troubles comming up with algorithm, that would split interval of integers (lets say from 0 to 1000000) with these parameters :

  1. There are N kinds of regions. Neighboring regions cannot have same kind, otherwise, they would be considered same region. N is going to be relatively small compared to whole interval, in this of interval 0-1000000 it won't be higher than 10.
  2. Each kind has "ratio", where ratios of all kinds add up to 1. This If I were to average total size of each region of single kind across multiple execution, I would expect them to have this ratio.
  3. "cohesion" for either whole interval or specific kinds (not sure which is possible). If the "cohesion" is high, then regions would be big, continuous blocks. If the "cohesion" is low, then regions would be small pieces strewn all around.

The total number of regions is not set beforehand. It is emergent based on kind number and cohesion.

I know how to split interval into regions, but I don't know how to apply above parameters to control it's output.

Example:

The ratio parameter would cause, if there were 3 kinds of regions with ratios of 50% A, 40% B and 10% C, then I would expect, for interval from 0 to 10, something like BBBBCAAAAA or ABBACAABAB depending on cohesion. That means if I were to add up all A, B and C, I would get the above ratios, on average.

Lets say we have interval from 0 to 10 and 3 kinds (A, B, C) and low cohesion. I would expect something like : ABACCBBABC . If instead cohesion was high, I would expect BBBBBAACCC.

In above example, characters of same kind would form single region.

Another way to look at it. Imagine the algorithm would randomly generate what kind each number in the region is, based on the defined ratios. Then intervals with same kind would be aggregated into regions. But that would result in low cohesion, as each region would be 1-2 values wide. Instead, I want regions to be big areas. Eg. have high cohesion.

Image to better illustrate what I mean by "cohesion". N is 3 for Red, Green, Blue and interval is 0-50:

Cohesion example

Low cohesion, produces lots of small, scattered regions. High cohesion creates few, big regions.

Best Answer

You could try a modified Poisson process. I will assume you are dividing up the continuous interval $[0,1]$ for simplicity, this can be discretized later.

There will be a series of borders, at positions $0=X_0<X_1<\dots<X_n<1$ to be randomly chosen. To the right of each border $X_i$ is a region whose kind is $n(i)$.

Associated to each kind $n$ is a parameter $\lambda_n$. These should be chosen so that

  1. ${\lambda_n^{-1}}/({\sum_{m=1}^N \lambda_m^{-1}})$ is equal to the desired "ratio" for the $n^\text{th}$ kind.

  2. $\frac1N\sum_{m=1}^N\lambda_m$ is equal to the desired expected number of borders. More borders $\implies$ less cohesion.

Here is the process: for each $i=0,1,2,\dots$, $n(i)$ is uniformly randomly chosen from the set of $N-1$ kinds, all except for $n(i-1)$ (to prevent adjacent repeats). Then, $X_i$ is set to be $X_i + Y_i$, where $Y_i$ is exponentially distributed with parameter $\lambda_{n(i)}$. This continues until $X_{n+1}>1$ for some border $X_{n+1}$.

Related Question