Solved – Can a Machine Learning algorithm output an array of values as solution

machine learning

I'm working on a small project which tries to predict what are the interesting words (for me) in a sentence.

In order to achieve this goal, I use a Part-Of-Speech Tagger on a set of training sentences, then I manually create a training dataset using a User-Interface I built for this purpose. This UI, display one sentence at a time with the corresponding POStags for each word and then I tell it what are the words that are important to me. And all this information is saved in a database.

My goal is to feed a machine learning algorithm with numerical values corresponding to the POStags of a sentence and get predicted the positions of the important words in the sentence…

I had in mind to set-up a Neural Network to discover hidden patterns and correctly predict the words that matters to me using the position and type of POStags (not the words themselves), but I just realised that I can't get a NN to output such a prediction! (well, if this is possible, I don't know how.)


Let's take an example with the following training sentence:
"Thousands march through London against Brexit".

In this case, my POSTagger script will return the following values:
"NNS / VBP / IN / NP / IN / NP"
which I translate to the following input:
[13,31,6,14,6,14]

And the words that interest me in this particular sentence are "march / against / Brexit", that is the words at positions 2, 5, 6 (starting at 1).

Finally, I expect the algorithm to return the following array: [2,5,6].


And here is my problem, I don't know which algorithm to use in order to output an array of values rather than a single value.

Do you know any ML algorithm that could fit my problem?

I'm still a novice in Machine Learning (I may be ignorant of obvious solutions…) and this project is mainly an exercise I set-up in order to improve myself and test some ideas.

Thank you.


edit: I found some solutions to test. I will give them a try and eventually come back here to share if it works. (But feel free to share your solutions if you have some).

Best Answer

I think such a problem would usually be framed as trying to (1) estimate the importance of every word or (2) classify each word as important or unimportant. Once you've trained and applied such a model, you can then get a list of words that are important in a given sentence by simply taking all the words predicted to be important (under option 2) or taking the $n$ most important words or all the words with an importance score past a given threshold (under option 1).

Related Question