Lights Out was a popular puzzle in which all lights in some device had to be turned off by pressing on them, which turned off all neighbouring lights as well.
We consider instead the same variant introduced in this other question: pressing a light will flip the state of all lights in the same column and the same row. For instance (borrowing the same example from that question) by pressing on X we get:
1 0 0 0 1 0 1 0
0 0 X 0 --> 1 1 1 1
0 1 0 0 0 1 1 0
0 0 1 0 0 0 0 0
The answers to that question tell us when it is possible to solve this puzzle, but not necessarily in an optimal way.
Given some initial configuration, how can we find the minimal number of moves needed to turn off all the lights?
EDIT: After a little research I found out that this variant was apparently first defined in Araújo, Paulo Ventura. "How to Turn All Lights Out." Elemente der Mathematik 55.4 (2000): 135-141. An investigation into minimal solutions of similar problems is carried out in this paper, which might contain the answer to my question.
EDIT 2: Unfortunately, it doesn't: the minimization results of Section 4 are about another variant. Instead, these lecture notes are about the classical "Lights Out" problem, but investigate the issue of minimization (Section 24.2.4).
EDIT 3: I put on hold the attack outlined in EDIT 2 and investigated using a BFS, as suggested by this homework assignment. Unfortunately the results weren't encouraging: despite the heroic speedup achieved by the answers to this question, that algorithm is too slow for $n > 6$.
EDIT 4: The implementation given in this question works correctly for even $n$, but is too slow for odd $n > 11$.
Best Answer
What you're trying to do is find the vector in the solution space with the minimum Hamming weight. Unfortunately, this problem is equivalent to the minimum distance problem in coding theory, which has been proven to be NP-hard (http://people.math.umass.edu/~hajir/m499c/vardy-complexity.pdf). That said, the implementation you link to can be sped up greatly by using ints as bit arrays and using bitwise operations, effectively parallelizing many of the operations.
Here's the quick-and-dirty Java solution I wrote for the Google Foobar challenge, which runs reasonably quickly and works for n <= 15.