Solved – Defining fixed effect and random effect in a model

definitionfixed-effects-modelgeneralized linear modelmixed modelrandom-effects-model

I'm unconfident that whether my understanding on fixed effect and random effect is correct:

Fixed effect= variable that make inferences about the specific levels.

Random effect= variable that make inferences about and generalise to a wider population.

The aim for my model is to suggest the kind of videos to create on youtube so that they become popular and get large number of views.

I have 6 variables in my glm model:

  1. Channel – YouTube account the video was uploaded from ( all account names, e.g.Netflix, star wars etc)
  2. Views – Number of times the video was viewed ( that observed over unequal time interval)
  3. Comments_disabled – Whether the channel disabled other users from commenting on the video (no = comments enabled, yes = comments disabled)
  4. Theme – Category of the video (e.g. ‘Drama’, ‘Family’ etc)
  5. Weeks – Number of weeks available on YouTube to date
  6. Tags – Number of tags, key words assigned to the video that users can search for within YouTube

I defined them as:

Fixed effect: 2, 4, 5

Random effect: 1, 3, 6

I have categorized tags as random effect but I am not very certain about it.

And what is the main difference between a fixed effect model, a random effect model and a mixed model? From my understanding of these three models, fixed effect model = all variables are fixed effects, random effect model = all variables are random effects and mixed model = both fixed effects and random effects variables are in the model ?

Also, is it possible to get a glm model that only includes fixed effects?

I used the code below in glm

 glm( views ~ weeks, data = "youtube" , family = "poisson", link = "log") 

and keep saying Error in eval(predvars, data, env) :
invalid 'envir' argument of type 'character'
.

I'm not sure where I went wrong here. Any help would be appreciated.

edit: I have figured out my code, it shall be glm( views ~ weeks, data = "youtube" , family = "poisson" (link = "log"))

Best Answer

When we include fixed and random effects, we call it a mixed effects model or often just a mixed model

From your description, it appears that only channel should be a random effect (random intercept)

Comments_disabled is a binary variable doesn't meet any reasonable criteria for fitting it as random. It should be a fixed effect.

Tags seems like a numeric variable and should be a fixed effect, though you might want to consider also fitting random slopes for it.

It's important to note that there are two types of random effects - random intercepts and random slopes. Random intercepts are for grouping variables, typically identifiers, and observations will be clustered within these. Within each cluster, you can allow fixed effects to vary, by specifying random slopes for those variables. It rarely makes sense to have a variable specified as a random slope without it also being a random intercept. So your mixed model could be something like:

views ~ Comments_disabled + Theme + Weeks + Tags + (1 | Channel), data = mydata

If you think that the effect, of, for example Weeks and Tags should vary by Channel then you can specify random slopes for them like this:

views ~ Comments_disabled + Theme + Weeks + Tags + (1 + Weeks + Tags | Channel), data = mydata

As for the question about glm, ignoring the clustering of observations in Channel this would lead to biased estimates, so you need to adjust for this, by, for example, fitting random intercepts for Channel with a mixed model. Also, although Views is a count (and potentially Poisson-distributed) if the counts are high then a normal distribution might be better. With count data, under- and over-dispersion can also be a problem.

As for the error, this is because of data = "youtube" (why to you have youtube in double quotes ? - generally you want something like data = mydata where mydata is dataframe in R. Proabbly you should have data = youtube ). However please note that this is not a site for programming questions.

Related Question