Algorithms – Algorithm to Find Maximum Sum

algorithms

Given 2 unsorted arrays, A and B of size n. We can select either A[i] or B[i] at each index i. Find the maximum sum we can get if we can select <=k consecutive numbers from one array.

Now, I can solve this question using dynamic programming of order O(n*k) but the question demands an algorithm of order O(n). here n,k<=$100000$.

Can someone help? Thanks!

Edit: As asked in the comments, I will include an example with n=4 and k=2.

$$A = [5,2,4,5]$$
$$B = [4,5,2,1]$$

maximum would be $5 + 5 + 4 + 5 = 19$

Best Answer

Edit 8-8-22

I (and OP) probably should have checked if this question was posted somewhere else already.

I found this answer where a $\mathcal{O}(n \log(k))$ solution is given. It seems similar to my answer, but instead looks at the complete arrays in one go, without looking at the "ideal solution" first.


Introduction

I thought quite a lot about this problem and came up some ideas that might be interesting, but unfortunately I don't have a complete solution (yet). I hope this post might eventually lead to a good solution.

The "solution" I came up with is an $\mathcal{O}(n \log(k))$ algorithm, but as I will explain in the end of this post, it is incorrect in its current form.

I would be very surprised (but also very interested) if an $\mathcal{O}(n)$ algorithm is possible here. It seems to me that if $k$ is large, then $\mathcal{O}(n)$ is too much to ask for.

An $\mathcal{O}(nk)$ algorithm

Now for the algorithm: Start by computing the list $L$ in $\mathcal{O}(n)$, where $L[i] = \begin{cases} 0 & A[i] = B[i]\\ 1 & A[i] > B[i]\\ 2 & A[i] < B[i] \end{cases}$.

The "ideal solution" would be to choose for every $i$ the largest of the two elements $A[i]$ and $B[i]$. However, this is not possible if there is a consecutive subsequence of indexes in $L$ of size larger than $k$ (except for a string of zeros, we will ignore from now on, since they give no problems). Suppose we have such a consecutive subsequence $S$ from index $i$ to $j$, where $j-i > k$.

The idea is to change the choice for some indexes of the "ideal solution" $L$ in such a way, such that we meet the requirement of having no sequence larger than $k$ with the same index choice, and minimize our losses in the process.

For the consecutive subsequence $S$, compute the list $C$ in $\mathcal{O}(n)$, where $C[l] = \displaystyle\left\lvert A[l]-B[l] \displaystyle\right\lvert$ (I'm assuming that all entries of $A$ and $B$ are nonnegative).

Now define the list $D$, where $D[l] = C[l]$ if $l-k < i$ and for $l-k \geq i$: \begin{equation} D[l] = \left( \min_{m \in [l-k,l-1]} D[m] \right) + C[l]. \end{equation}

And for $l = j+1$: \begin{equation} D[j+1] = \min_{m \in [j+1-k,j]} D[m]. \end{equation}

We wish to compute $D[j+1]$, because then we know (if we store in the indexes used to compute the minima) which members of $L$ we can change such that we are as close as possible to the "ideal solution" and also meet the requirement given on $k$.

At first it seems that we can only compute $D[j]$ in $\mathcal{O}((j-i)k)$ by using dynamic programming. If we now do this for every consecutive subsequence $S$, then we would get an algorithm with worst case complexity $\mathcal{O}(kn)$. (This algorithm also has the problem I describe at the end).

Towards an $\mathcal{O}(n \log(k))$ solution

However, there is optimization possible, because the minima we compute in $D[l]$ and in $D[l+1]$ are very similar.

We can do the following: first define $T$ as the list of $k$ elements: $D[i]$ up to $D[i+k-1]$ and sort it in $\mathcal{O}(k \log(k))$.

Now $D[i+k]$ is easy to compute, just take the smallest element of $T$ and add $C[l]$ to it in $\mathcal{O}(1)$.

Now we move on to $D[i+k+1]$. We now have to find and delete $D[i]$ from our sorted list $T$, because in it is no longer in the range $[i+1,i+k]$, in $\mathcal{O}(\log(k))$. Then include $D[i+k]$ in $T$ such that $T$ remains sorted, in $\mathcal{O}(\log(k))$. Now $D[i+k+1]$ can be computed the same way as $D[i+k]$.

This gives an $\mathcal{O}((j-i)\log(k))$ algorithm for the consecutive subsequence $S$, and therefore an $\mathcal{O}(n\log(k))$ algorithm for the complete problem in the worst case.

The problem

The first problem might be that $\mathcal{O}(n\log(k))$ is too slow. But like I said before, I don't know an $\mathcal{O}(n)$ solution, and for the ranges given for $n$ and $k$, my algorithm probably only takes a few seconds in a good implementation.

But, the more important problem is that by running to algorithm for a consecutive sequence $S$, you might interfere with other consecutive subsequences. Namely, if you choose the first or the last entry of a consecutive subsequence in your process of altering the "ideal solution", then you might run into problems.

Here is an example where it goes wrong: Take $k = 3$ and \begin{equation} A = [5,5,5,0,0,0,0]\\ B = [0,4,0,3,5,5,5]. \end{equation}

Then we have $L = [1,1,1,2,2,2,2]$. If we start with the consecutive subsequence from indexes $1$ to $3$, then we don't have to alter $L$, since $k = 3$.

However, if we then look at the consecutive subsequence from indexes $4$ to $8$. Then we compute: $D[8] = \min(3,5,5) = 3$, and the index where this minimum occurs is $4$, so change $L[4]$ into $1$. However this is not possible because of the previous consecutive subsequence. Hence, we have to change one of $L[5],L[6],L[7]$. This leads to a total sum of 28, which is not ideal.

It would be better to make the changes $L[2] = 2$ and $L[4] = 1$ instead, because then the total sum is 29. But, I don't see a way yet to implement that in the algorithm given in the previous paragraph. You have to know somehow when it is worth it to lengthen another consecutive subsequence and when it is not.

Perhaps there is a way to glue the solutions of the separate consecutive subsequences together, or to do everything in one go all together.

Related Question