Solved – Generating random numbers based on ‘rule-of-thumb’ proportions

pythonrrandom-generation

I will try to be as clear as I possibly can:

  • I want to populate a matrix of about 10 million rows by 43 columns as a test to an application
  • The first task is to generate a random number around 10 million
  • Second I need to fill out the columns. Let's say for column 2, random ints can only be generated between 1 and 25. Also, the distribution/frequency of value = 1 is about 80% of the 10 million, 9 to 25 is about 15% where 9 is 90% of the 15%, and 5% randomly distributed between everything else.
  • Let's say for columns 5 to 10, we know that if column_2 = 10 then there is about 80% possibility that the values will be roughly some_int_1.
  • I have a lot of these little rules.
  • None of them are discrete.
  • How can I go about generating such matrix?

Best Answer

If I think I've got what you're saying, here's some R code for basic stuff:

ss<-1000 #just in case of problems, change to 1000000 for your purposes
a1<-rnorm(ss)
a2<-sample(1:25, ss, replace =T, 
  prob=c(0.8, rep(0.05/length(2:8), length(2:8)), 0.135,  
  rep((0.15-0.135)/length(10:25), length(10:25))))
a5<- ifelse(a2 == 10, sample(1:5, 1, prob=c(0.8, rep(0.2/4, 4))), sample(1:5, 1))

df1<-cbind(a1, a2, a5)

If you want computational efficiency, well that's a little different.

Related Question