22 R Nonlinear Regression
22.1 Learning Objectives
In this chapter, you will learn how to:
- Run a regression with a quadratic term
- Run a regression with log transformations
22.2 Set-up
To complete this chapter, you need to
- Start a R Markdown document
- Change the YAML to at least the following. Feel free to add arguments.
---
title: 'R Chapter 22'
author: 'Your name'
output:
html_document:
theme: spacelab
df_print: paged
---
- Load the following packages
We will use the Mroz
dataset within the carData
package. Be sure to view the documentation for these data in the Help tab of the bottom-right pane by typing the name of the dataset in the search bar.
22.3 Quadratic term
Recall the below regression model from Chapter 8 that includes a squared term for Age
, which allows our regression line to change directions once as Age
changes. We included this term because Figure 8.1 suggested wages initially increase with age, then decreases.
\[\begin{equation} Wage = \beta_0 + \beta_1Age + \beta_2Age^2 + \beta_3Educ + \epsilon \end{equation}\]
The below code demonstrates how to include a quadratic term within the lm
function.
In this case, the code reflects the equation only somewhat; the I()
is necessary to tell R that Age^2
is the squared version of Age
. Otherwise, R would not recognize Age^2
in the data, thus excluding it from the regression.
Now we can obtain results in the usual manner.
term | estimate | std_error | statistic | p_value | lower_ci | upper_ci |
---|---|---|---|---|---|---|
intercept | -22.722 | 3.023 | -7.517 | 0 | -28.742 | -16.701 |
Age | 1.350 | 0.134 | 10.077 | 0 | 1.083 | 1.617 |
I(Age^2) | -0.013 | 0.001 | -9.840 | 0 | -0.016 | -0.011 |
Educ | 1.254 | 0.090 | 13.990 | 0 | 1.075 | 1.432 |
We need to alter the Mroz
data slightly before running a regression. Run the following code that creates a new variable that equals 1 if lfp
equals “yes” and 0 if lfp
equals “no.” This is necessary because our outcome variable–even though categorical–must be represented numerically in order for the regression to work.
Exercise 1: Suppose we want to examine factors that explain whether married women participate in the labor force, which is a binary outcome. We use the following model:
\[\begin{equation} lfp = \beta_0 + \beta_1k5 + \beta_2age + \beta_3age^2 + \beta_4wc + \beta_5lwg + \beta_6inc + \epsilon \end{equation}\]
Run this regression model and obtain the results.
22.4 Log Transformation
In Chapter 8, the following log-log regression model was run.
\[\begin{equation} ln(LifeExp)=\beta_0 + \beta_1ln(GDPpercap) + \beta_2Continent + \epsilon \end{equation}\]
The below code demonstrates how to transform a variable into its natural log within the lm
function.
Note that all we need to do is place the appropriate variables within the log()
function, which R interprets as the natural log. This temporarily transforms the variables; it does not create new variables in the dataset equal to the natural log of the variables.
Now we can obtain results in the usual manner.
term | estimate | std_error | statistic | p_value | lower_ci | upper_ci |
---|---|---|---|---|---|---|
intercept | 3.062 | 0.026 | 117.692 | 0 | 3.011 | 3.113 |
log(gdpPercap) | 0.112 | 0.004 | 31.843 | 0 | 0.105 | 0.119 |
continent: Americas | 0.133 | 0.011 | 12.519 | 0 | 0.112 | 0.154 |
continent: Asia | 0.110 | 0.009 | 12.037 | 0 | 0.092 | 0.128 |
continent: Europe | 0.166 | 0.012 | 14.357 | 0 | 0.143 | 0.189 |
continent: Oceania | 0.152 | 0.029 | 5.187 | 0 | 0.095 | 0.210 |
Exercise 2: Suppose we decide we want to use the
natural log of family income exclusive of wife’s income,
inc
, resulting in the following model
\[\begin{equation} lfp = \beta_0 + \beta_1k5 + \beta_2age + \beta_3age^2 + \beta_4wc + \beta_5lwg + \beta_6ln(inc) + \epsilon \end{equation}\]
Run this regression model and obtain the results.