Solved – Twitter data and regression time series

regressiontime series

I have been collecting the number of new followers every day for about 6 months for 30 different twitter accounts and I know the exact time that they started following. I have been also collecting all tweets for these accounts during this time period.

I'm interested in whether some independent variables (tweet rate, number of mentions, retweets, links, sentiment of the tweets) are related to an increase in followers (or the dependent variable.)

I'm wondering what the appropriate approach is for this time series data.

For example, I could use linear regression to see if the total amount of tweets per day predict the amount of new followers per day. However, I don't think that would be appropriate because actions people take don't immediately affect the number of followers. But I'm not sure what the time delay would be or if there is a different approach that would be more appropriate for this kind of data and the question I am asking. I am using R.

Best Answer

You can use an ARMAX Model to relate the amount of new followers (y) to the number of tweets per day (x). This model will suggest the appropriate delay and response mechanism. Care should be taken to ensure that outiliers/level shifts/local time trends are correctly identified and incorporated. There may also be the need to take into account particular days of the week, particular days of the month , holiday effects et al.

Related Question