I have data on various drugs and p values describing how strongly these drugs are associated with a given disease (e.g., type 2 diabetes; p values calculated with gene set analysis). I want to calculate an enrichment area under the curve using roc, where the x axis moves left from the lowest p values (one p-value per drug) to the highest p values and the y axis describes the % of approved type 2 diabetes drugs found as you move from left to right on the x axis. I know this sort of thing is done for drugs when doing virtual screening etc; however, I have not done this before and am little lost on how to start. Is it possible to use auc functions from sci kit learn for something like this?
drug | p-value | disease | approved for disease i |
---|---|---|---|
drug1 | 0.0032 | type2d | 1 |
drug2 | 0.004 | type2d | 1 |
… | … | … | … |
drug100 | 0.87 | type2d | 0 |
Thanks in advance for any help!
Best Answer
For those interested, this should work:
'indication' here is the clinical indication of a drug. So this will create the enrichment curve for a group of drugs in your data set, where the group of drugs is all drugs that are approved for use of whatever disease you are want to look at and have in your data.
'df' is a data frame that contains a p value for each drug with a phenotype and contains the disease a drug is approved for.
edit = typo