MATLAB: Protein sequence training

protein sequence

hi All 🙂
i have protein sequences data, then i wanna to train it with neural network backpropagation, but i have a problem with data training.. because,the number of input unit and output unit are different and heterogeneus, how to equalize the number of input unit and output unit?? for example input: HKJMMLLKNJHVHVHCVGVJKLHCVGCGCH output: JMMLL
input: fdsfnhldglsfhldsfhidjgfodjgijidshfjzhnjfndjngvfngidjgxbvfgd output: idjg
there are poeple who suggest to use sliding window, but i'm so confused how to implement it
thx for ur suggestion before 🙂

Best Answer

In general, the number of elements in the input vector, I, and the number of elements in the corresponding output vector, O, are different.
Decide what the output sequence should be given the corresponding input sequence. Then encode each so that it is a column vector of length I and O, respectively.
Finally form input and output matrices from N i/o pairs to have dimensions
[ I N ] = size(inputmatrix)
[ O N ] = size(outputmatrix)
Hope this helps.
Greg