class: center, middle, inverse, title-slide .title[ # PADP 7120 Data Applications in PA ] .subtitle[ ## RLab 7: Reporting Regression Results ] .author[ ### Alex Combs ] .institute[ ### UGA | SPIA | PADP ] .date[ ### Last updated: April 09, 2024 ] --- # Outline - Report regression results using more common tables and graphs - This lab involves a lot of copying-and-pasting of code. You want to have these slides open on your computer. --- # Setup > **Start a new project and Rmd** > **Change YAML.** ```r --- title: "RLab 7: Reporting Regression" author: "Your Name" output: html_document: theme: spacelab df_print: paged --- ``` --- # Setup > **In setup code chunk, load following packages** ```r library(tidyverse) library(moderndive) library(fivethirtyeight) library(stargazer) # for better regression tables library(jtools) # for better regression visualizations library(interactions) # for better regression visualizations ``` --- # Setup > **Add below code to setup code chunk** ```r state_data <- hate_crimes %>% filter(state != "District of Columbia") %>% mutate(urbanization = if_else(share_pop_metro<=0.75, "low", "high")) %>% select(state, median_house_inc, share_pop_hs, share_unemp_seas, urbanization) %>% mutate(share_pop_hs = share_pop_hs*100, share_unemp_seas = share_unemp_seas*100) ``` --- # Regression Model `$$MedInc = \beta_0 + \beta_1 \%HS + \beta_2Urbanization + \beta_3Unemp + \epsilon$$` > **In a new code chunk, add code that runs the above regression model.** --- # Reporting Results - We have covered how to produce a standard table of results using `get_regression_table` |term | estimate| std_error| statistic| p_value| lower_ci| upper_ci| |:-----------------|----------:|---------:|---------:|-------:|-----------:|----------:| |intercept | -93885.933| 32564.060| -2.883| 0.006| -159433.985| -28337.880| |share_pop_hs | 1804.975| 333.078| 5.419| 0.000| 1134.524| 2475.427| |urbanization: low | -6845.794| 1834.261| -3.732| 0.001| -10537.969| -3153.619| |share_unemp_seas | -1017.102| 1105.597| -0.920| 0.362| -3242.552| 1208.349| - This table is okay but not typically how results are reported --- # Regression Tables - Regression results tables usually include: - estimate - standard error - asterisks (stars) for statistical significance based on p-value - clear variable names - number of observations - Sometimes goodness-of-fit measure like `\(R^2\)` - Sometimes include confidence intervals instead of standard error --- # Regression Tables > **In a new code chunk, add below code. Include the code chunk option `results='asis'`. Knit to HTML.** ```r stargazer(inc_mod, type="html", title="Regression Results", dep.var.labels=c("Median Household Income"), covariate.labels=c("Percent High School", "Low Urbanization", "Unemployment Rate"), keep.stat=c("n"), # reports number of observations digits = 0, # sets the number of decimals to report digits.extra = 3) # number of decimals if rounding results in 0 ``` --- # Better Regression Tables > **Change options to include confidence intervals instead of standard errors and include goodness-of-fit. Knit to HTML.** ```r stargazer(inc_mod, * ci=TRUE, ci.separator = " : ", type="html", title="Regression Results", dep.var.labels=c("Median Household Income"), covariate.labels=c("Percent High School", "Low Urbanization", "Unemployment Rate"), * keep.stat=c("n","rsq","ser"), digits = 0, digits.extra = 3) ``` --- # Better Regression Tables > **Add the last line below. Knit to HTML.** ```r stargazer(inc_mod, ci=TRUE, ci.separator = " : ", type="html", title="Regression Results", dep.var.labels=c("Median Household Income"), covariate.labels=c("Percent High School", "Low Urbanization", "Unemployment Rate"), keep.stat=c("n","rsq","ser"), digits = 0, digits.extra = 3, * out = "inc_mod.htm") ``` - This will generate a new document named `inc_mod.htm` that you can open in Word. --- class: inverse, center, middle # For presentations, we may want to use graphs instead of tables. --- # Visualizing Regression - One option is to use `ggplot` .pull-left[ ```r ggplot(state_data, aes(x = share_pop_hs, y = median_house_inc, color = urbanization)) + geom_point() + geom_smooth(method = 'lm', se = FALSE) + theme_minimal() + theme(title = element_text(size = 16), axis.text = element_text(size = 12)) ``` ] .pull-right[ <img src="rlab8-reporting-reg_files/figure-html/unnamed-chunk-11-1.png" width="99%" /> ] --- # Visualizing Regression Line - `ggplot` cannot easily visualize regression models that include: - **ONE** numerical explanatory variable - **ONE** categorical explanatory variable (with or without interaction) -- - `ggplot` cannot show us the relationship between education and income between high and low urbanization controlling for unemployment rate --- # Visualizing Regression (effect_plot) > **Add below code in a new code chunk** ```r effect_plot(inc_mod, # Saved reg results pred = share_pop_hs, # an explanatory variable plot.points = TRUE, # show points or not data = state_data, # data used for regression y.label = "Median household income", x.label = "% High School or Above") ``` --- # Visualizing Regression (interact_plot) - To include urbanization, can use `interact_plot` from `interactions` package > **In a new code chunk, add below code** ```r interact_plot(inc_mod, pred = share_pop_hs, modx = urbanization, # the interaction variable plot.points = TRUE, data = state_data, y.label = "Median household income", x.label = "% High School or Above") ``` --- # Visualizing Coefficients - May want to visualize the estimates instead of using a table. > **Add following code in a new code chunk.** ```r plot_coefs(inc_mod) ``` ![](rlab8-reporting-reg_files/figure-html/unnamed-chunk-14-1.png)<!-- --> --- # Wrap up > **Knit to HTML. Let's examine what all this code did.** > **Upload your Rmd to eLC**