It seems to me that a first step would be to try to create some models of how tcp header data might relate to your categories. That is, do you have any theories?
If you do, it might turn out that you need to preprocess your packet info: for example using the window size of the previous packet rather than the current one, or the using the day of the week instead of the day of the month.
Then you need to look carefully at your inputs and outputs. Are they categorical ("car", "truck"), ordered categorical ("small", "medium", "large"), etc? Your linear regression is probably treating your categories like they're continuous (1..N) and your plot shows there's no such linear relationship -- and there's probably no reason to expect there should be.
Once you have an idea of models that might make sense, have meaningful variables, and know the types of these variables, methods will naturally fall into place. (For example, continuous variables in and binary category out naturally suggests logistic regression.)
EDIT: In terms of logistic regression, it can be used with multiple outcomes. Look for multinomial logistic regression.
In terms of validation, you train your model with your training set then predict on the validation data and see how accurate you are. Obviously, if you look at your accuracy on your training data, it'll tend to overestimate your accuracy since it's what you tuned your model to. A better test of how you'll do in the real world is to use data that your tuning (training) process never used.
Best Answer
The problem with your data is not that it is extremely detailed: you have no values at weekends, that's why it is plotted with gaps. There are two ways to deal with it:
smooth.spline
,loess
, etc.). Code of simple interpolation is below. But in this case you will introduce something "unnatural" and artificial to the data. That's why I prefer second option.Hope it will help.