It seems to me that a first step would be to try to create some models of how tcp header data might relate to your categories. That is, do you have any theories?
If you do, it might turn out that you need to preprocess your packet info: for example using the window size of the previous packet rather than the current one, or the using the day of the week instead of the day of the month.
Then you need to look carefully at your inputs and outputs. Are they categorical ("car", "truck"), ordered categorical ("small", "medium", "large"), etc? Your linear regression is probably treating your categories like they're continuous (1..N) and your plot shows there's no such linear relationship -- and there's probably no reason to expect there should be.
Once you have an idea of models that might make sense, have meaningful variables, and know the types of these variables, methods will naturally fall into place. (For example, continuous variables in and binary category out naturally suggests logistic regression.)
EDIT: In terms of logistic regression, it can be used with multiple outcomes. Look for multinomial logistic regression.
In terms of validation, you train your model with your training set then predict on the validation data and see how accurate you are. Obviously, if you look at your accuracy on your training data, it'll tend to overestimate your accuracy since it's what you tuned your model to. A better test of how you'll do in the real world is to use data that your tuning (training) process never used.
A threshold isn't trained with the model because logistic regression isn't a classifier (cf., Why isn't Logistic Regression called Logistic Classification?). It is a model to estimate the parameter, $p$, that governs the behavior of the Bernoulli distribution. That is, you are assuming that the response distribution, conditional on the covariates, is Bernoulli, and so you want to estimate how the parameter that controls that variable changes as a function of the covariates. It is a direct probability model only. Of course, it can be used as a classifier subsequently, and sometimes is in certain contexts, but it is still a probability model.
Best Answer
You should take such a result w/a pinch of salt. For instance, due to a particular mix, despite a bad train result, you can get a good test result. Therefore, it's better if you do a cross-validation-like approach and try to define multiple train-test scenario and see if you have consistency across them.
PS, if your output is like classes, you may wanna you classification, not regression ...