Imagine that you are planning a study about risk behaviours among HIV positive injecting drug users.

All the individuals included in the sample are injecting drugs and all are HIV positive. The main “exposure” in the study is the person’s awareness about their HIV serostatus (some of the drug users do know that they are HIV positive and some don’t know). The outcome is having had unprotected intercourse during the past 4 weeks.

Your imaginary study has not been performed yet and you are planning for data collection.

Which variables (age, place of living, relationship status, education etc, you can choose whatever variables you think are relevant) would you like to add in the data in order to assess confounding, mediation and effect modification of the association of interest in the study?

Your question is actually a very hard one to answer. It is however good that you are asking before the study has been conducted - preferably well before the study is conducted. So this answer comes in a few parts:

  1. As many as you can possibly collect, given constraints of time and money. There is (almost) no such thing as too much data, and its harder to collect a variable you realized you needed after the fact than to ignore a variable in your data.
  2. Read the literature. What are similar studies including? This has two different reasons behind it. First, those authors thought about what variables to include in their study, and there's no reason not to co-opt their work and expertise for your own ends. Second is to get a feel for what are standard, "must have" variables in your study that, if they're missing, you may get reviewer blowback for.
  3. Do you own a copy of Modern Epidemiology, 3rd Edition by Rothman, Greenland and Lash (Amazon link)? You should if you're considering running this kind of study. You may find Chapters 9 & 12 to be illuminating. Especially Chapter 12, on causal diagrams, which can be used as a study planning tool to identify sets of confounders you will likely need to control for.
  4. Read up on the use of Directed Acyclic Graphs as study planning tools - again, Chapter 12 of Modern Epi 3. A Google search for "DAG Confounding" will yield a wealth of potential resources. Once you have a feel for how they work, sit down with the rest of your study team - preferably in a room with a very large white board - and start making a causal diagram for your study. Try to work everything you can into the graph, then reduce it down, because again, it's better to overshoot than miss. There are software tools like Dagiity to help with the more labor intensive parts of the analysis.

Planning for potential confounding and effect modification is a long process that relies fairly heavily on subject matter expertise. Make sure you have a good team. If you don't feel like you do, or could use more, see if someone in your department or organization can help you out - there are lots of HIV/AIDS epidemiologists out there. I can think of some variables I would think are important (number of sexual partners, access to testing facilities, etc.) but you'd be better served by understanding the process rather than just having a list.

