Solved – Interpreting crfsuite output model for numerical features

conditional-random-fieldmachine learning

I am using crfsuite-python to implement a linear chain CRF in which I would like to use numerical features rather than strings as is the case with the standard CRF application parts of speech tagging. The documentation of crfsuite mentions "Formally, the amount of the influence of a feature is determined by a scaling value of the corresponding attribute multiplied by the feature weight."
My sample input looks like,

Walking no_of_pauses:4  lat:32.91469737  lon:-117.18923483       snrUsed:235  
Walking no_of_pauses:4  lat:32.91469737  lon:-117.18923483       snrUsed:235  
Walking no_of_pauses:4  lat:32.91469737  lon:-117.18923483       snrUsed:235

And the output crf model looks like,

   LABELS = {
          0: Walking
    }
    ATTRIBUTES = {
          0: no_of_pauses
          1: lat
          2:  snrUsed
    }
    TRANSITIONS = {
    }
    STATE_FEATURES = {
      (0) no_of_pauses --> Walking: -0.000000
      (0) lat --> Walking: 0.000000
      (0)  snrUsed --> Walking: 0.000000
      (0) 
     --> Walking: -0.000000
    }

If no_of_pauses was a numerical attribute, then What does the "no_of_pauses –> Walking" : 0.00000 imply here?
Also, the model does not take into consideration the "lon" attribute specified in the input. This is because there is an option in the CRF trainer in crfsuite called "feature.minfreq" which is 0 by default and hence drops the attributes with negative values. What does the word minfreq mean here if it is simply referring to scaling value?

Note: I have used a very small sample set, so there is only one label and the computed probabilities do not make much sense

Best Answer

If no_of_pauses was a numerical attribute, then What does the "no_of_pauses --> Walking" : 0.00000 imply here?

It means that i-th "no_of_pauses" value is multiplied by 0.00000 when computing score for label==Walking at i-th position.

What does the word minfreq mean here if it is simply referring to scaling value?

It is common for CRFs to use binary features; old crfsuite versions didn't support float values. If all your features have only 1 or 0 values (1==feature is present, 0==feature is absent) then minfreq means 'minimum frequency of a feature in the training set'. It seems CRFsuite author decided to continue using weights for minfreq when he added support for arbitrary float values.

--

By the way, are you sure you want a linear-chain CRF for your data? What are you trying to model?

Also, it could also make sense to transform your features. I seriously doubt lat:32.91 means the same as "lat:16.455, but 2x more", but that is what CRF capable of learning. Maybe you should use e.g. a difference between nearby coordinates instead of coordinates themselves, or create a "grid" of coordinates and use binary features "point is in this region". Or maybe a completely different model is needed (the data looks 2D, not 1D).