The best way to visualise a panel regression

data visualizationggplot2panel datascatterplot

Simple scatter plots or a correlation matrix can be misleading since they aggregate all years together. What would be te best way to visualise relationships between variables taking time into account? For example, the scatter matrix below shows relations between certain variables, but aggregates the data for all the years.
enter image description here

What are common ways to visualise relations for panel data? Is it simply to make multiple scatter plots for each year?

The data set is a panel data set containing data for 10 countries for 20 years. I created some sample data with:

# Sample data frame
df_Q <- data.frame(country = rep(c("A","B","C","D","E","F","G","H","I","J"),each=20),
               year = rep(2001:2020, times = 10),
               gini = rnorm(200, mean = 25, sd = 7),
               total_trade = rnorm(200, mean = 14, sd = 8), 
               intraEU_trade = rnorm(200, mean = 7, sd = 6),
               inward_FDI = rnorm(200, mean = 10, sd = 5),
               outward_FDI = rnorm(200, mean = 20, sd = 4),
               unemployment = rnorm(200, mean = 15, sd = 10))

The data looks like this:

  # country year     gini total_trade intraEU_trade inward_FDI outward_FDI unemployment
# 1       A 2001 21.94901   22.717850      9.536230  14.539323    22.84906    20.408894
# 2       A 2002 12.77493   15.837161      8.702562  12.178246    16.76624     9.992739
# 3       A 2003 30.79006    0.750060      3.924690   7.463981    24.93762    25.092569
# 4       A 2004 24.92870   15.447970     12.882917  13.696632    21.89461    20.729777
# 5       A 2005 31.19249    8.476526     10.160960  10.759731    20.35886    25.562575
# 6       A 2006 20.23424   18.060654      8.271882  13.904214    22.95711    16.132112

I estimated some models with the plm package, but I would like to know how to visualise the bivariate relationships between the variables, taking the year into account.

The plot was created with:

library(corrmorant) 
library(ggplot2)
ggcorrm(df_Q[,c("gini", "total_trade", "intraEU_trade", "inward_FDI", "outward_FDI", "unemployment")]) +
  lotri(geom_point(alpha = 0.7, size = 0.2)) + 
  lotri(geom_smooth(method = "lm", size = 0.6, color = "#ce5348")) + 
  utri_corrtext(nrow = 2, corr_size=FALSE, size=4) +
  dia_names(y_pos = 0.15, size = 2.5) +
  dia_density(lower = 0.3, color = 1, alpha=0.7, fill = "#a2a2a2") + 
  scale_y_continuous(n.breaks = 3) +
  scale_x_continuous(n.breaks = 3) +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

Best Answer

Specific to panel data: You may want to assign color to year. Since year has a natural ordering, choose a sequential color scheme so that it's easy to tell which points are from earlier vs later years. If you wanted to group by something like country instead, it would be fine to use a qualitative color scheme, in which there's no natural ordering to the colors and they are just chosen to be easy to tell apart.

Not specific to panel data: I like the 2 main approaches laid out in William Cleveland's The Elements of Graphing Data. In general, when trying to visualize how a relationship between 2 quantitative variables depends on a 3rd variable:

  • "superpose" the groups: show a scatterplot, but use another visual variable like color, point shape, etc. to show the different groups. Regarding your example above, I haven't worked with ggcorrm before but presumably you could add mapping=aes(color=year) somewhere.
  • "juxtapose" the groups: show the same plot repeated several times, once for each value of the grouping variable. In ggplot terms, just facet on the year (assuming ggcorrm allows it).

On p.59 of Rafe Donahue's "Fundamental Statistical Concepts in Presenting Data" there's a nice extension of the "juxtapose" idea: Make each facet also show data from the other facets, just greyed out. This essentially highlights one group at a time, but keeps it in context of all the other groups.
Update---here's an example of how to do this in ggplot:

# Generate fake data, with a trend in `total_trade` over time
df_Q <- data.frame(country = rep(c("A","B","C","D","E","F","G","H","I","J"),each=20),
                   year = rep(2001:2020, times = 10),
                   gini = rnorm(200, mean = 25, sd = 7))
df_Q$total_trade = rnorm(200, mean = 14, sd = 8) + 2*(df_Q$year-2000)

# Make a version of the dataset without `year`
df_NoYear <- df_Q
df_NoYear$year <- NULL

# Add `geom_point()` twice:
# FIRST as grey points in the dataset without `year`
# (so it repeats as background in every panel)
# THEN the real data with `year`
library(ggplot2)
ggplot(df_Q, aes(x = gini, y = total_trade)) +
  geom_point(data = df_NoYear, color = "grey85") +
  geom_point() +
  facet_wrap(~ year)

FacetsWithGreyBackground

I hope this illustrates how the "all-other-years" background points can be helpful as reference points, making it easier to notice a subtle trend across facets.

Related Question