The following assumes that $y_{ik} \geq 0$.
First, let's introduce a new set of variables $z_{ijk\ell}$:
\begin{gathered}
\sum_{i,j,k} z_{ijkl} \leq I_\ell \qquad \forall \ell \\
y_{ik} \cdot x_{ijk\ell} = z_{ijk\ell} \qquad \forall i,j,k,\ell
\end{gathered}
I hope you don't mind that for brevity I've dropped the index sets. This is equivalent (in $x$ and $y$) to
\begin{gathered}
\sum_{i,j,k} z_{ijkl} \leq I_\ell \qquad \forall \ell \\
y_{ik} \cdot x_{ijk\ell} \leq z_{ijk\ell} \qquad \forall i,j,k,\ell
\end{gathered}
This works because when the original inequality is feasible, we can always choose to have equality hold: but if there is slack in that inequality, we could distribute that slack across the values of $z_{ijk\ell}$ without changing equivalence.
Now that we've done that, we can use this equivalence:
$$y_{ik} \geq 0, ~~ y_{ik} \cdot x_{ijk\ell} \leq z_{ijk\ell} \quad\Longleftrightarrow\quad
0 \leq z_{ijk\ell}, ~~ y_{ik} - v_i ( 1 - x_{ijk\ell} ) \leq z_{ijk\ell}$$
If $x_{ijk\ell}=0$, then $z_{ijk\ell}\geq 0$ will be active. If $x_{ijk\ell}=1$, then the second constraint will be active, and it will be equivalent to $y_{ik} \leq z_{ijk\ell}$.
So the final conversion is
\begin{gathered}
\sum_{i,j,k} z_{ijkl} \leq I_\ell \qquad \forall \ell \\
y_{ik} - v_i (1- x_{ijk\ell}) \leq z_{ijk\ell} \qquad \forall i,j,k,\ell \\
0 \leq z_{ijk\ell} \qquad \forall i,j,k,\ell
\end{gathered}
Internally, a solver is probably going to treat each of your SOS1 constraints a lot like a binary variable -- basically, branch on $w_i$ and, in the $w_i = 1$ branch, force $v_i = 0$. The good news is that you avoid $M$; the bad news is that I don't think the solver has much information with which to assess which SOS1 constraint to branch on at a given node.
If the variables have natural (and not horribly large) upper bounds, the big $M$ approach may not to be too bad. If you don't know those bounds, you might relax integrality conditions and maximize each variable in turn to get bounds.
Another possibility is combinatorial Benders decomposition (Codato and Fischetti, Operations Research 54 (2006), 756-766). Rather than use big $M$ constraints, you have a master problem containing the $w_i$ and one continuous variable (a surrogate for the objective contribution of the $v_i$), and a subproblem that optimizes the original objective function over the $v_i$ subject to the original constraints. For a given solution to the master problem, you simply omit $v_i$ from the subproblem if $w_i = 0$. If the subproblem is infeasible, you add a "feasibility cut" to the master. If the subproblem has an optimal solution but its objective value is worse than the value of the surrogate variable in the master solution, you add a constraint ("optimality cut") to the master problem tying the value of the surrogate to the values of the $w_i$.
I would only use combinatorial Benders if the $M$ formulation was either seriously slow or had numerical stability problems (meaning, in either case, that the "big $M$" coefficients really were big). Also, I'd be hesitant to do it if some or all of the $v_i$ are integer valued, since (a) you'd be repeatedly solving a MIP subproblem and (b) the optimality and feasibility cuts you got might be weaker than usual.
Best Answer
If $X \le 2$, $Y\le 3$ is equivalent to $X \gt 2$ or $Y \le 3$ which can be modelled with the following 2 constraints:
$X - 2 + M_1 \delta \gt 0$
$Y - 3 \le M_2 (1- \delta )$
$X \le 1000$
where $\delta$ is a binary variable, $M_1 = 2$ and $M_2 = 1003$.
The values of $M_1$ and $M_2$ are chosen to allow the full posible range of values for X and Y. The constraints become
$X - 2 + 3 \delta \gt 0$
$Y - 3 \le 1003 (1- \delta )$
$X \le 1000$
When $\delta = 0, X \gt 2$ so the only requirement on Y is that it be $\le$ 1000, precisely what the second constraint becomes when $\delta = 0$ is substituted in.
When $\delta = 1$ the first constraint becomes $X \gt 0$, thus allowing $X \le 2$ and so we need Y$\le 3$, which is what the Y constraint becomes when $\delta = 1$ is substituted in.