Solved – What does weighted cumulative frequency distribution mean

cumulative distribution functiondistributionsweights

I have two sets of data (temperature and catch) and I am following a proposed method in an article I am reading on the empirical cumulative function (ECDF) analysis. Firstly, I have derived the ECDF for my temperature data using the ecdf function in R. The second step was to get the catch-weighted cumulative distribution for temperature, which I honestly cannot understand the concept.

$$
W(t)=\frac1n\sum_{i=1}^n\frac{y_i}{y}I_{\{x_i\}}
$$

where $y_i$ is the catch for day $i$; $y$ mean is the average of catch for all days and $I_{\{x_i\}}$ is the indication function.

How do I compute for this curve?

Best Answer

Here are a couple of base R suggestions, one for where the weights are integers but not too large and the second for where the weights are simply positive

# example data
df <- data.frame(temp=c(50,20,10,40), weight=c(3,1,4,2))

# unweighted empirical CDF
plot.ecdf(df$temp,
  main="unweighted ecdf")

# weighted empirical CDF if weights are positive integers or counts
plot.ecdf(rep(df$temp, df$weight),
  main="weighted ecdf 1 - using counts")

# weighted empirical CDF if weights are positive 
dfsorted <- df[order(df$temp), ]
dfsorted$cumfreq <- cumsum(dfsorted$weight) / sum(dfsorted$weight)
dfsorted2 <- dfsorted[rep(1:nrow(df), each=2),]
dfsorted2$cumfreq <- c(0,dfsorted2$cumfreq[-2*nrow(df)])
plot(dfsorted2$temp, dfsorted2$cumfreq, type="l",
  main="weighted ecdf 2 - general weights", xlab="temp", ylab="cumfreq")

So the unweighted ecdf looks like

enter image description here

and the first weighted ecdf looks like

enter image description here

and the second weighted ecdf looks like

enter image description here

Related Question