class: center, middle, inverse, title-slide # PADP 7120 Data Applications in PA ## RLab 9: Omitted Variable Bias ### Alex Combs ### UGA | SPIA | PADP ### Last updated: October 14, 2021 --- # Objectives By the end of this lab, you will have learned how to... - Compare two models on the basis of their omitted variable bias - Hypothesize the direction of omitted variable bias --- # Set-up > **Start a new project and Rmd** > **Change YAML** ```r --- title: "RLab 6: Omitted Variable Bias" author: "Your Name" output: html_document: theme: spacelab df_print: paged --- ``` --- # Set-up > **Download `predicting_happiness.csv` from eLC and import into RStudio** > **Load following packages** ```r library(tidyverse) library(moderndive) ``` --- # Introduction - Governments generally want citizens to be happy. - Suppose we are government officials interested in whether citizens' social network diversity affects their happiness. Perhaps we have concerns about isolation during the pandemic. --- # Data .pull-left[ ```r head(predicting_happiness, n = 4) %>% kable() ``` | Id| SWB| ND| PSS| |--:|---:|--:|---:| | 1| 33| 5| 93| | 2| 33| 6| 95| | 3| 29| 5| 89| | 4| 21| 6| 74| ] .pull-right[ - Subjective Well Being (SWB) - individual's own level of happiness from 0 to 40. - Network Diversity (ND) - scope and variability of a person's social capital from 0 to 10. - Perceived Social Support (PSS) - perception of how much support received from social networks from 0 to 100. ] --- # Correlations - Let's examine the correlations between the three numerical variables > **In a new code chunk, compute correlations between the three numerical variables using the `cor()` function.** > **Outside of the code chunk, complete the following sentences:** - ND is _ correlated with SWB. - PSS is _ correlated with SWB. - PSS is _ correlated with ND. --- # DAG - Suppose we theorize the effect of ND and PSS on SWB like so <img src="labs_files/happydag.png" width="407" style="display: block; margin: auto;" /> -- - We want to estimate the effect of ND on SWB. - How many backdoor paths between ND and SWB? Is it open or closed? -- - Therefore, what is the regression model needed to eliminate OVB? `$$SWB_i=\beta_0+ \dots ?$$` --- # Dueling Models `$$SWB_i=\beta_0+\beta_1ND_i+\beta_2PSS_i+\epsilon$$` - But suppose we choose to estimate the following model instead `$$SWB_i=\beta_0+\beta_1ND_i+\epsilon$$` --- # Biased regression `$$SWB_i=\beta_0+\beta_1ND_i+\epsilon$$` > **Estimate the biased model and produce a basic regression table** ```r biased <- ``` --- # Biased regression results |term | estimate| std_error| statistic| p_value| lower_ci| upper_ci| |:---------|--------:|---------:|---------:|-------:|--------:|--------:| |intercept | 20.995| 1.189| 17.651| 0.000| 18.656| 23.334| |ND | 0.523| 0.197| 2.657| 0.008| 0.136| 0.910| - What is the interpretation of the ND estimate? -- - **Haven't covered yet:** the estimate is also statistically significant at standard threshold of p-value < 0.05. -- - In other words, our estimate of 0.523 is too far (above) from 0 to be due to random noise. --- # OVB - But we know our estimate for ND is biased :( - Bias causes our estimate to be higher or lower than what it should be. -- - So we can't trust our estimate of 0.523. Maybe it should be closer to 0 and enough so that it could be random noise (i.e., statistically insignificant). --- # OVB - It would be useful to predict whether the bias was causing our estimate to be too high or low. - If it were causing our estimate to be too low, and given that 0.523 is already far enough above 0 to be statistically significant, we could still make useful conclusions. --- # OVB Direction <img src="labs_files/ovb-direction.png" width="399" style="display: block; margin: auto;" /> - Direction of OVB is similar to correlation between two variable - If Z causes X and Y to move in same direction, OVB is positive - If Z causes X and Y to move in opposite directions, OVB is negative --- # Revisiting correlations .pull-left[ ```r predicting_happiness %>% select(-Id) %>% cor() %>% kable(digits = 2) ``` | | SWB| ND| PSS| |:---|----:|----:|----:| |SWB | 1.00| 0.13| 0.47| |ND | 0.13| 1.00| 0.31| |PSS | 0.47| 0.31| 1.00| ] .pull-right[ - ND is positively correlated with SWB. - PSS is positively correlated with SWB. - PSS is positively correlated with ND. ] Based on the correlations between these three variables, what direction is the OVB for `\(ND\)` for the biased model? --- # OVB Direction - PSS affects SWB and ND according to our theoretical DAG - PSS effect is the same direction on SWB and ND according to the correlations - We can predict this OVB will be positive - Knowing the direction of the OVB, can we make a valid conclusion concerning the effect of network diversity (ND) on happiness (SWB)? -- - No. Our estimate is positive and our OVB is predicted to be positive. Therefore, our estimate for ND may be statistically insignificant if not for the positive OVB pushing it above 0. --- # Valid regression - Since we have the data for the omitted variable, let's confirm that this works the way we predicted it to > **Estimate the valid model and provide the results** `$$SWB_i=\beta_0+\beta_1ND_i+\beta_2PSS_i+\epsilon$$` ```r valid <- ``` --- # Valid regression |term | estimate| std_error| statistic| p_value| lower_ci| upper_ci| |:---------|--------:|---------:|---------:|-------:|--------:|--------:| |intercept | -2.350| 2.589| -0.908| 0.365| -7.439| 2.740| |ND | -0.050| 0.185| -0.271| 0.787| -0.415| 0.314| |PSS | 0.316| 0.032| 9.892| 0.000| 0.253| 0.379| - Does our conclusion for ND change? -- - Yes. The estimate is negative AND it is no longer statistically significant p-value = 0.787 > 0.05 --- # Computing OVB - OVB can be calculated by subtracting the valid estimate from the biased estimate: `$$b^{biased}-b^{valid}$$` > **Calculate the magnitude of the OVB on the estimate for ND between the two models** - Is it in the direction we predicted? --- # Computing OVB ```r 0.523 - (-0.050) ``` ``` ## [1] 0.573 ``` - OVB is positive, just as we predicted. - You now understand bias. Congratulations! > **Knit and/or save your Rmd and upload Rmd to eLC** --- # What is the story? - What is the story of our valid model?