I'm unconfident that whether my understanding on fixed effect and random effect is correct:
Fixed effect= variable that make inferences about the specific levels.
Random effect= variable that make inferences about and generalise to a wider population.
The aim for my model is to suggest the kind of videos to create on youtube so that they become popular and get large number of views.
I have 6 variables in my glm model:
- Channel – YouTube account the video was uploaded from ( all account names, e.g.Netflix, star wars etc)
- Views – Number of times the video was viewed ( that observed over unequal time interval)
- Comments_disabled – Whether the channel disabled other users from commenting on the video (no = comments enabled, yes = comments disabled)
- Theme – Category of the video (e.g. ‘Drama’, ‘Family’ etc)
- Weeks – Number of weeks available on YouTube to date
- Tags – Number of tags, key words assigned to the video that users can search for within YouTube
I defined them as:
Fixed effect: 2, 4, 5
Random effect: 1, 3, 6
I have categorized tags as random effect but I am not very certain about it.
And what is the main difference between a fixed effect model, a random effect model and a mixed model? From my understanding of these three models, fixed effect model = all variables are fixed effects, random effect model = all variables are random effects and mixed model = both fixed effects and random effects variables are in the model ?
Also, is it possible to get a glm model that only includes fixed effects?
I used the code below in glm
glm( views ~ weeks, data = "youtube" , family = "poisson", link = "log")
and keep saying Error in eval(predvars, data, env) :
.
invalid 'envir' argument of type 'character'
I'm not sure where I went wrong here. Any help would be appreciated.
edit: I have figured out my code, it shall be glm( views ~ weeks, data = "youtube" , family = "poisson" (link = "log"))
Best Answer
When we include fixed and random effects, we call it a mixed effects model or often just a mixed model
From your description, it appears that only
channel
should be a random effect (random intercept)Comments_disabled
is a binary variable doesn't meet any reasonable criteria for fitting it as random. It should be a fixed effect.Tags
seems like a numeric variable and should be a fixed effect, though you might want to consider also fitting random slopes for it.It's important to note that there are two types of random effects - random intercepts and random slopes. Random intercepts are for grouping variables, typically identifiers, and observations will be clustered within these. Within each cluster, you can allow fixed effects to vary, by specifying random slopes for those variables. It rarely makes sense to have a variable specified as a random slope without it also being a random intercept. So your mixed model could be something like:
If you think that the effect, of, for example
Weeks
andTags
should vary byChannel
then you can specify random slopes for them like this:As for the question about
glm
, ignoring the clustering of observations inChannel
this would lead to biased estimates, so you need to adjust for this, by, for example, fitting random intercepts forChannel
with a mixed model. Also, althoughViews
is a count (and potentially Poisson-distributed) if the counts are high then a normal distribution might be better. With count data, under- and over-dispersion can also be a problem.As for the error, this is because of
data = "youtube"
(why to you haveyoutube
in double quotes ? - generally you want something likedata = mydata
wheremydata
is dataframe in R. Proabbly you should havedata = youtube
). However please note that this is not a site for programming questions.