Solved – How to use predicted features in prediction

algorithmsmachine learningpredictive-modelstime series

I have ratings data.

My ratings data contains some technical features like the Originator (Channel), the exact day & hour of the broadcast, duration of the program etc. and, obviously, the label, which is the rating.
So the data looks like that:

+---------+------------+-----------------+----------------+----------------------------------+---------------+
| Program | Originator |      date       | Duration (min) | some other technical features…   | Actual rating |
+---------+------------+-----------------+----------------+----------------------------------+---------------+
| Empire  | FOX        | 24/5/2016 21:00 |             58 | …                                | 4.6%          |
| Gotham  | FOX        | 24/5/2016 21:58 |             32 | …                                | 3.1%          |
+---------+------------+-----------------+----------------+----------------------------------+---------------+

Based on the historical ratings data, I need to predict the future ratings, when all the parameters of the train are given, except the label of course.

My problem is:

A very strong feature for rating prediction is the carry over, or what was the rating of the previous program.

I want to train my model with the feature of the carry over, but I'm not sure how I should add it?
Should I train the model with the real carry-over? (the actual rating of the previous program)? In the test set the carry-over would be just an approximation to the real carry-over (it will be the prediction, not the actual rating, as oppose to the training data – because I can't know in advance what would be the rating of the previous program) , so the correlation of the carry over with the real ratings in the test set would be less significant then in the train set…
How should I tackle this problem?

Best Answer

There seem to be (at least) three types of feature engineering to do here. The first one involves the carry-over (which is the focus of your question). The second one involves transformations of other data to forms more usable by ML algorithms. The third one involves competing programs.

I'll begin with the second one, as it's necessary for the first and third ones.


Leaving carry-over for later at the moment, it looks like some of the features can be manipulated for better use (you might have done this already, but it's not indicated in the question).

  • The date column - television viewing probably has strong daily seasonal components, probably weekly seasonal components, and possibly yearly seasonal components (see, for example, Prediction Of TV Ratings With Dynamic Models). It's a-priori unlikely that a show airing at 3:30AM before a workday, will have the same rating as a show airing on the early evening of a weekend. It also might be the case that people watch differently in the winter and summer, during vacations, and so on.

    Because of this, you might want to transform the date column into a number of features: the daily hour, the day of the week, possibly an indicator variable of whether it's weekend/vacation, the month, possibly an indicator of season, and possibly an indicator of vacation period.

  • The Program column - Say your test data will be $n$ days into the future, and the particular program is already on the air. The ratings now known for this program are probably some indicator for the future rating. Consequently, you might want to add two columns (at least): the past rating for this program $n$ days or earlier (using an average, for example), and a column for the number of measurements used for the past ratings. (If a show was not on $n$ days earlier, you could encode it as -1 and 0, respectively.) You could go further with this and analyze trends for the program, or ratings for similar shows, but perhaps you should start with this.

  • The Originator column - you might want something like one-hot encoding here.

(If you haven't done this already,) these transformations might increase the overall prediction accuracy, and decrease the relative importance of the carry-over. Some of these features can also be used as proxies for the carry-on.


I want to train my model with the feature of the carry over, but I'm not sure how I should add it? Should I train the model with the real carry-over? (the actual rating of the previous program)? ... I can't know in advance what would be the rating of the previous program)

In general, it's best to avoid training using one thing, and testing using something that isn't exactly the same. So, as your question implies, it's problematic to train using the actual carry on, and then predict using the predicted carry on.

Instead of predicting the immediately-previous carry on, let's think based on how we would do so.

  • The popularity of the previous program is possibly affected by its time of day, day of week, and so forth, but that's already "encoded" in the features for the current show, so why repeat it.

  • Similarly, the popularity of any of the immediately-preceding shows might be determined by their carriers, but that adds nothing (we know that Fox will be airing something just before the show we're predicting, for example). This is already implicitly encoded in the other features.

  • The one thing that doesn't seem to be already encoded in the other columns, is the past ratings of the shows immediately preceding this show. For example, when Game Of Thrones airs, its probable that the show immediately following it will enjoy large carry-over, but we know this because the past ratings of GoT were high. I think it's best to encode this data as a feature, and let the predictor learn how to use it.

    One straightforward feature to add, therefore, would be the rating of the most popular show preceding it based on its past ratings (using the same two-column encoding as before).


Finally, you might want to add as a feature the rating of the most popular show competing with this one, again based on its past ratings.