Now with the edit, this is solvable. Choose the BW tagged box. All boxes are wrongly tagged by definition.
Pick a ball, if ball is black, then, this box is not WW for sure, hence it is BB.
Now if the BB-tagged box were originally BW box, then the WW-tagged box would be rightly tagged, which would be a contradiction, hence, BB-tagged box is actually a WW box, and WW tagged box is actually a BW box.
Similarly, you can retag boxes if a white ball came out of BW box.
I did not check the solution for the classic $12$ balls version here. But if it works, it trivially leads to a $4$ weighing solution for the $18$ balls case.
Really, given the classic, there is very little extra work to do!
First you weigh $3A$ vs $3B$. If they are unbalanced, say $3A > 3B$, you can find out with $3A$ vs $3C$ (all $3C$ are good) whether the bad ball is heavier or lighter. Then surely you can find the culprit among a group of $3$ with just one more weighing. Total $3$ weighings.
And if $3A = 3B$, then you are reduced to the classic $12$-ball problem which can be solved with $3$ additional weighings, for a total of $4$.
Further thoughts: In fact, $4$ weighings can solve $30$ balls, not just $18$.
In the above, the $3A \neq 3B$ branch always leads to $3$ total weighings, which is wasteful. Imagine you have $9+9+12 = 30$ balls. The first weighing can be $9A$ vs $9B$. If they are unbalanced, again a second $9A$ vs $9C$ (all good) will tell you if the bad one is heavy or light, and then you can use $2$ more weighings to find the culprit among $9$ (tri-nary search), for a total of $4$ weighings.
Even further, years ago I solved a case (an extension to the classic) where $13$ balls (unknown heavy/light) can be solved with $3$ weighings, provided you have access to extra balls known to be good -- IIRC you need $2$ such good extras. This means $9+9+13 = 31$ can be solved with $4$ weighings, coz in the $9A=9B$ case you are indeed left with $13$ suspects but many extra balls known to be good.
I suspect even $31$ is not the limit (for $4$ weighings). When you weigh $9A$ vs $9C$, only two outcomes can happen (since $9A > 9B$). This is very inefficient and further exploitation might be possible...
You probably know the classic bound that with $n$ weighings there are only $3^n$ possible results, so with $n=4, 3^n = 81$, you cannot solve $\ge 41$ balls ($\ge 82$ outcomes). I'm not saying $40$ is achievable, but there is a wide gap between $31$ and $40$...
Best Answer
You should make the measurement in the following way. "[x,...,y] true/false" means radioactivity was found / not found when measuring the balls with number x,...,y. The last two lines of each paragraphs are the tuples wich are possible after the last measurement is true / false. It is easy to process the remaining possibilities
Edit:
How can one check this? First of all you should check that these paragraphs are the node of a tree. Then you can check each paragraph by comparing each of the 105 pairs of balls [[1,2],...,[14,15]] with the measurements. Take the first pair [1,2] and the following paragraph
Measurement 3 must be true but this is wrong for this pair therefore it can be skipped. Take the pair [1,7]. Measurement 1,3,4,5 is true and measurement 2 is false, therefore it is in the last but one line. For the pair [1,9] again measurement 2 is false and measurement 1,3,4 is true, but measurement 5 is false, therefore it is in the last line. so all node of the tree can be checked. You can do these checks with a simple program (I used one written in maxima).
Edit: Maxima-Program
Edit:
How did I derive the result? With trial and error. The following Lemma guides our trials
Lemma 1 (without proof): In every step of the algorithm the following is true: Let $S$ be the list of the remaining possible solution and $M$ be the list of balls to be measured and $n$ be the number of allowed measurements. Let $ T(S,M)$ be the list of all pairs of $S$ where at least one element of the pair is in $M$ and let $F(S,M)$ the list of all pairs of $S$ where no element of the pair is in $M$. Then the following is a necessary condition for an algorithm to solve all instances of the problem. $$\begin{align*} |S|&\leq 2^n\\ |T(S,M)|&\leq 2^{n-1}\\ |F(S,M)|&\leq 2^{n-1} \end{align*}$$
After this step the new S is $T(S,M)$ or $F(S,M)$ depending of the result of the measurement $M$. The new $n$ is $n-1$.
Finding two balls with at most k measuremants from n balls we call an (n,k)-problem. We want to find an algorithm for the (15,7)-problem.
The (15,7)-problem therefore $\binom{15}{2}=105$ possible solution pairs and therefore an algorithm to find such a pair must make at least 7 measurements in
the worstcase because $2^6<105<2^7$. We want to investigate the first measurement. Let's arange the list of all possible ball combination in the following
triangle manner
if the first measurement is made with the list $[1,\cdots,m]$ then $14+13+(15-m)<=2^6$ and $(15-(m+1))+...+2+1<=2^6$ must hold. $2^6=64$ therefore $m$ can
be $4$ or $5$. if $m=3$ and therefore $M=[1,2,3]$ then $|F(S,M)|=1+2+\cdots + 11=66>2^6=64$, if $k=6$ then $M=[1,2,3,4,5,6]$ and $|T(S,M)|=14+13+\cdots + 9 = 69 > 2^6=64$.
if for the first measurement $m=4$ and therefore $M=[15,14,13,12]$ and the result of the measurement is false then one must find with at most $6$
measurement the radioactive pair in the remaining $11$ balls. so we must save the (11,6)-problem.
Lemma 2: There is no algorithm for the (6,4)-problem.
Proof: If $M=[1]$ then $|F(S,M)|=4+3+2+1=10>8=2^3$. If $M=[1,2]$ then $|T(S,M)|=5+4=9>8=2^3$.
Lemmas 3: There is no algorithm for the (8,5) problem.
Proof: for the first measurement: $M=[1]$ or $M=[1,2,3]$ are not possible using the same arguments as in the (6,4) case. Therefore $M$ must be $[1,2]$. If the result of the measurement is false, the algorithm must solve the (6,4)-problem which is impossible by Lemma 2.
Lemma 4: There is no algorithm for the (11,6)-problem.
Proof: if $M=[1,2]$ then $|F(S,M)|=36>32=2^5$, if $M=[1,2,3,4]$ then $T(S,M)=34>32=2^5$. therefore $M$ must be $[1,2,3]$. Suppose that the first measurement is false 8 balls remains and our algorith must solve the (8,5)-problem wich is impossible by lemma 3.
From Lemma 4 an the reasoning before llemma 2 the following follows:
Lemma 5: If there is an algorithm for the (15,7)-problem its first measurement must measure a list of 5 balls.
We can assume that the first measurement is on the list [1,2,3,4,5]. The second measurement is on a list [x,x+1,...,y] that has zero, one or more elements
in common with [1,2,3,4,5]. First I tried [6,7,8,9,10,11] but had problems to proceed. Therefore I used my program to check each of the 105 N=[x,...,y] if $T(S,N)<=2^5=32$ and $F(S,N) <32$.
the following lists were found by the program [4,5,6]
[5,6,7,8,9]
[6,7,8,9,10,11]
the program also found the lists [7,...,12]
[8,...,13]
[9,...,14] [10,...,15] but these lists are basically the same as [6,...,11] after renaming the balls.
I continued with the list [4,5,6]. The remaining measurements I found by trial an error using my function and lemma 1. If one searches the third measuring after [1,2,3,4,5] and [4,5,6] one can use the fact that the numbers [1,2,3], [4,5], [7,8,9,10,11,12,13,14,15] are equivalent relative to [1,2,3,4,5] and [4,5,6]. this narrows down the cases to investigate.