[Math] Probability of Event Occurring in Time-Series

probabilityprobability distributionsprobability theorystatistics

So I'm trying to model email openings over the course of a day.

For example, I have a data set for one individual that simply has a bunch of time-stamps of when he has opened an email. I don't care about the date, just the time of day.

So over the course of a 24 hour day, there will be events along the time-series (X) axis. I want to model this so that I can plug in a time and it will give me the probability of the event (email opening) occurring at that specific time given the historical data. So the Y axis has no magnitude, it's just an event happened at that specific time or it didn't.

Not sure how to represent this mathematically. Any help would be appreciated!

Best Answer

You may be interested in this article about Steven Wolfram's analysis of 2 decades worth of timestamped emails: http://www.nytimes.com/2012/04/08/business/mining-our-personal-data-for-our-own-good.html.

The state space is a cylinder, since the X axis representing time of day is a (periodic) phase variable, but I disagree that the Y axis has no magnitude. Rather, the Y axis will initially be a count of the # of emails that were opened/sent occurring within a narrow time window, say, every hour or every minute.

This count histogram by time of day can then be normalized (by the total count) to obtain the rate (ie fraction of all emails opened/sent in any time window), which has a direct interpretation as a probability distribution.

Related Question