One-way Table vs Two-way Table. What’s the difference

differencesself-studytablesvariable

The following is the definition representative of the definitions if you google the difference between the one-way and two-way table:
Sometimes we talk about data tables in terms of independent and dependent variables. In the case of one-way data, we had one independent variable, called the individuals, and one or more dependent variables, called the variables. In the case of two-way data, we have two independent variables on which the variables are dependent.

Reference:Two-Way Data Compared To One-Way Data

Now, take a look over these two tables, i.e., one-way and two-way tables respectively:
One-way table
Two-way table

My questions:

  1. Is this reference credible?
  2. If yes, what do dependent and independent variables mean in this context, like the dependent variable wouldn't have existed if it was not for the independent variable, since it's usually the one caused by the independent variable in our ordinary sense of these two terms?
  3. We usually collect dependent variables, called just variables in stats, around independent variables, called individuals in stats, for instance: students and their grades, products and their revenue, and so forth. In the first table, the one-way table, we summarize the total earnings monthly, and we can do that for years as well. In the second table, the two-way table, we mix the two one-way tables to gain what? Why do we add two one-way tables to make one two-way table? The reference author suggests it is to gain more visibility, which to me sounds like a generic reason.

Best Answer

That reference is terrible and it's just confusing you. I recommend pretending you never saw it. Here's a better way to think about it.

A dataset is basically just a table with rows and columns. The rows are called "observations" and usually represent individual "entities." Often these are people, but could also be other things like schools, countries, plants, fish, or even time periods like days or months.

The columns are called "variables" and represent characteristics of the observations. So if your observations are people you might have a variable called "age," and another one called "has job," and another called "female." For a given person the value of the variable "age" will store the number of years old they are (this is called a continuous variable), and the value of "has job" will be a 1 if they have a job, and 0 if they don't (this is called a binary categorical variable). "Female" will likewise be 1 for females and 0 for males.

A one way table is just a summary of the fully range of responses for a single variable across the entire dataset. For age it the table would show the number (or percentage) of ALL people in the dataset who are 18 years old, the number who are 19, the number who are 20 and so on. A one way table for "has job" would just be the number of people who have a job, and the number of people who don't.

A two way table compares two different variables, showing the number of people who gave a particular combination of responses to each one. So if I did a two way tab of female and job it would give me a two by two table, with one cell showing the number of people who are "male, no job" another showing the number who are "female, no job" etc.

Here's where the distinction between dependent and independent variables actually starts to matter. Just looking at the number of people in each cell doesn't tell us much about what these two variables have to do with one another. We need to transform these numbers into percentages....but how to do that? Do we want the percent of males who have jobs? Or the percent of people who have jobs who are male? or the percent of all people who are male and have jobs? These are all different numbers. What do we do?

In analysis the dependent variable is the thing we are trying to explain or change, and the independent variable is the thing we think might explain or change the dependent variable. This is something that we bring to the table with our own theory and knowledge, so a variable might be a dependent variable in one analysis and independent in another. Here though, it's clear that job is the dependent variable and sex is the independent variable: being female clearly might impact your employment status, but it seems really unlikely that not having a job would cause your sex to change.

Given that knowledge we want to calculate percentages within the independent variable. We want to know the percentage OF females who have a job, and then compare that with the percentage of males who have a job. If those two percentages are different, we know that sex is related to having a job. Of course, that doesn't mean that sex is CAUSING people to have jobs or not.....or even that the relationship really exists out in the world.

Related Question