Solved – Generating random numbers based on ‘rule-of-thumb’ proportions

pythonrrandom-generation

I will try to be as clear as I possibly can:

I want to populate a matrix of about 10 million rows by 43 columns as a test to an application
The first task is to generate a random number around 10 million
Second I need to fill out the columns. Let's say for column 2, random ints can only be generated between 1 and 25. Also, the distribution/frequency of value = 1 is about 80% of the 10 million, 9 to 25 is about 15% where 9 is 90% of the 15%, and 5% randomly distributed between everything else.
Let's say for columns 5 to 10, we know that if column_2 = 10 then there is about 80% possibility that the values will be roughly some_int_1.
I have a lot of these little rules.
None of them are discrete.
How can I go about generating such matrix?

Best Answer

If I think I've got what you're saying, here's some R code for basic stuff:

ss<-1000 #just in case of problems, change to 1000000 for your purposes
a1<-rnorm(ss)
a2<-sample(1:25, ss, replace =T, 
  prob=c(0.8, rep(0.05/length(2:8), length(2:8)), 0.135,  
  rep((0.15-0.135)/length(10:25), length(10:25))))
a5<- ifelse(a2 == 10, sample(1:5, 1, prob=c(0.8, rep(0.2/4, 4))), sample(1:5, 1))

df1<-cbind(a1, a2, a5)

If you want computational efficiency, well that's a little different.

Best Answer

Related Solutions

Solved – How to visualize/summarize a matrix with number of rows $\gg$ number of columns

Solved – How to spot and remove unimportant columns in a dataframe

Related Question