Nameless is wrong.
Math example
Raw Data: (1,1,0,0,1,1) (first three entries are Item 1 (repeated three times), last three Item 2 (likewise repeated 3 times)
corresponding predictions Model 1: (.5,.5,.5,.6,.6,.6)
corresponding predictions Model 2: (.9,.9,.9,.5,.5,.5)
-loglikelihood of Model 1: 4.01
-loglikelihood of Model 2: 4.59
The BIC penalty for each additional parameter would be ln(6)=1.79
BIC Model 1 (2 parameters): 11.62
BIC Model 2: (4 parameters): 16.35
Difference: 4.73 (rounded)
Aggregated Data (else same data): data=(2,2) (first entry is Item 1 (counted 2 times of 3), second one is Item 2 (likewise counted 2 out of 3 times) (same as before:)
corresponding predictions Model 1: (.5,.6)
corresponding predictions Model 2: (.9,.5)
-loglikelihood of Model 1: 1.82
-loglikelihood of Model 2: 2.39
When using n=2 (instead of 6 as with raw data) the BIC penalty for each additional parameter would be ln(2)=.69
BIC Model 1 (2 parameters): 5.03
BIC Model 2: (4 parameters): 7.56
Difference: 2.53
Note: this is a different result, than with the raw data above and underestimates the penalty!
When using n=6 in this case,
BIC Model 1 (2 parameters): 7.22
BIC Model 2: (4 parameters): 11.96
Difference: 4.73!
Note: This is the same BIC difference as with using raw data, although the data was aggregated and the loglikelihoods differ.
The reason is: althoug summing up 6 estimated -loglikelihoods instead of 2 (aggregated) leads to higher loglikelihoods in total for each model, but the difference between the -logliklehoods of both models is totally the same, no matter whether you are using raw or aggregated data, as long at is is the same data, and the same model prediction.
Use TOTAL SAMPLE SIZE (or underestimate the penalty)
I wonder how many people did this wrong until now... :)
To at least make sure this has an answer:
The origin is effectively arbitrary. Smaller BIC (further left on the number line) is better by the criterion.
However, one ting to beware of if you compare BICs produced in different ways (using different models, or even the same models produced on different software) is the constants involved in the likelihood; it's common to drop constant terms, but if different models/software don't treat them in an equivalent way the BICs won't be comparable. Sometimes software will tell you exactly what computation is being performed in which case you can usually sort these issues out. When they don't, some detective work may be required. With the same model on different software this is usually easy to spot (and adjust for). With different models you may be able to figure out what is happening if there's a subset of models in common.
Best Answer
Given your further comment, I am not surprised at this result. BIC is a penalized log likelihood. It is useful for comparing models on one data set (here, each participant), but not for comparing across data sets.
What this result is telling you, in essence, is that the model fits very differently for different people, but that the amount of improvement in the fit by adding two parameters is about the same for each person.