22 R Nonlinear Regression

22.1 Learning Objectives

In this chapter, you will learn how to:

  • Run a regression with a quadratic term
  • Run a regression with log transformations

22.2 Set-up

To complete this chapter, you need to

  • Start a R Markdown document
  • Change the YAML to at least the following. Feel free to add arguments.
---
title: 'R Chapter 22'
author: 'Your name'
output: 
  html_document:
    theme: spacelab
    df_print: paged
---
  • Load the following packages
library(tidyverse)
library(moderndive)
library(carData)

We will use the Mroz dataset within the carData package. Be sure to view the documentation for these data in the Help tab of the bottom-right pane by typing the name of the dataset in the search bar.

22.3 Quadratic term

Recall the below regression model from Chapter 8 that includes a squared term for Age, which allows our regression line to change directions once as Age changes. We included this term because Figure 8.1 suggested wages initially increase with age, then decreases.

\[\begin{equation} Wage = \beta_0 + \beta_1Age + \beta_2Age^2 + \beta_3Educ + \epsilon \end{equation}\]

The below code demonstrates how to include a quadratic term within the lm function.

quad_mod <- lm(Wage ~ Age + I(Age^2) + Educ, data = wages)

In this case, the code reflects the equation only somewhat; the I() is necessary to tell R that Age^2 is the squared version of Age. Otherwise, R would not recognize Age^2 in the data, thus excluding it from the regression.

Now we can obtain results in the usual manner.

get_regression_table(quad_mod)
term estimate std_error statistic p_value lower_ci upper_ci
intercept -22.722 3.023 -7.517 0 -28.742 -16.701
Age 1.350 0.134 10.077 0 1.083 1.617
I(Age^2) -0.013 0.001 -9.840 0 -0.016 -0.011
Educ 1.254 0.090 13.990 0 1.075 1.432

We need to alter the Mroz data slightly before running a regression. Run the following code that creates a new variable that equals 1 if lfp equals “yes” and 0 if lfp equals “no.” This is necessary because our outcome variable–even though categorical–must be represented numerically in order for the regression to work.

my_Mroz <- Mroz %>% 
  mutate(lfp_numeric = if_else(lfp == "yes", 1, 0))

Exercise 1: Suppose we want to examine factors that explain whether married women participate in the labor force, which is a binary outcome. We use the following model:

\[\begin{equation} lfp = \beta_0 + \beta_1k5 + \beta_2age + \beta_3age^2 + \beta_4wc + \beta_5lwg + \beta_6inc + \epsilon \end{equation}\]

Run this regression model and obtain the results.

22.4 Log Transformation

In Chapter 8, the following log-log regression model was run.

\[\begin{equation} ln(LifeExp)=\beta_0 + \beta_1ln(GDPpercap) + \beta_2Continent + \epsilon \end{equation}\]

The below code demonstrates how to transform a variable into its natural log within the lm function.

loglog <- lm(log(lifeExp) ~ log(gdpPercap) + continent, data = gapminder)

Note that all we need to do is place the appropriate variables within the log() function, which R interprets as the natural log. This temporarily transforms the variables; it does not create new variables in the dataset equal to the natural log of the variables.

Now we can obtain results in the usual manner.

get_regression_table(loglog)
term estimate std_error statistic p_value lower_ci upper_ci
intercept 3.062 0.026 117.692 0 3.011 3.113
log(gdpPercap) 0.112 0.004 31.843 0 0.105 0.119
continent: Americas 0.133 0.011 12.519 0 0.112 0.154
continent: Asia 0.110 0.009 12.037 0 0.092 0.128
continent: Europe 0.166 0.012 14.357 0 0.143 0.189
continent: Oceania 0.152 0.029 5.187 0 0.095 0.210

Exercise 2: Suppose we decide we want to use the natural log of family income exclusive of wife’s income, inc, resulting in the following model

\[\begin{equation} lfp = \beta_0 + \beta_1k5 + \beta_2age + \beta_3age^2 + \beta_4wc + \beta_5lwg + \beta_6ln(inc) + \epsilon \end{equation}\]

Run this regression model and obtain the results.

22.5 Save and Upload

Knit your Rmd to save it and check for errors. If you are satisfied with your work, upload to eLC. Once you upload, answers will become available for download.