22 R Nonlinear Regression
22.1 Learning Objectives
In this chapter, you will learn how to:
- Run a regression with a quadratic term
- Run a regression with log transformations
22.2 Set-up
To complete this chapter, you need to
- Start a R Markdown document
- Change the YAML to at least the following. Feel free to add arguments.
---
title: 'R Chapter 22'
author: 'Your name'
output:
html_document:
theme: spacelab
df_print: paged
---- Load the following packages
We will use the Mroz dataset within the carData package. Be sure to view the documentation for these data in the Help tab of the bottom-right pane by typing the name of the dataset in the search bar.
22.3 Quadratic term
Recall the below regression model from Chapter 8 that includes a squared term for Age, which allows our regression line to change directions once as Age changes. We included this term because Figure 8.1 suggested wages initially increase with age, then decreases.
\[\begin{equation} Wage = \beta_0 + \beta_1Age + \beta_2Age^2 + \beta_3Educ + \epsilon \end{equation}\]
The below code demonstrates how to include a quadratic term within the lm function.
In this case, the code reflects the equation only somewhat; the I() is necessary to tell R that Age^2 is the squared version of Age. Otherwise, R would not recognize Age^2 in the data, thus excluding it from the regression.
Now we can obtain results in the usual manner.
| term | estimate | std_error | statistic | p_value | lower_ci | upper_ci |
|---|---|---|---|---|---|---|
| intercept | -22.722 | 3.023 | -7.517 | 0 | -28.742 | -16.701 |
| Age | 1.350 | 0.134 | 10.077 | 0 | 1.083 | 1.617 |
| I(Age^2) | -0.013 | 0.001 | -9.840 | 0 | -0.016 | -0.011 |
| Educ | 1.254 | 0.090 | 13.990 | 0 | 1.075 | 1.432 |
We need to alter the Mroz data slightly before running a regression. Run the following code that creates a new variable that equals 1 if lfp equals “yes” and 0 if lfp equals “no.” This is necessary because our outcome variable–even though categorical–must be represented numerically in order for the regression to work.
Exercise 1: Suppose we want to examine factors that explain whether married women participate in the labor force, which is a binary outcome. We use the following model:
\[\begin{equation} lfp = \beta_0 + \beta_1k5 + \beta_2age + \beta_3age^2 + \beta_4wc + \beta_5lwg + \beta_6inc + \epsilon \end{equation}\]
Run this regression model and obtain the results.
22.4 Log Transformation
In Chapter 8, the following log-log regression model was run.
\[\begin{equation} ln(LifeExp)=\beta_0 + \beta_1ln(GDPpercap) + \beta_2Continent + \epsilon \end{equation}\]
The below code demonstrates how to transform a variable into its natural log within the lm function.
Note that all we need to do is place the appropriate variables within the log() function, which R interprets as the natural log. This temporarily transforms the variables; it does not create new variables in the dataset equal to the natural log of the variables.
Now we can obtain results in the usual manner.
| term | estimate | std_error | statistic | p_value | lower_ci | upper_ci |
|---|---|---|---|---|---|---|
| intercept | 3.062 | 0.026 | 117.692 | 0 | 3.011 | 3.113 |
| log(gdpPercap) | 0.112 | 0.004 | 31.843 | 0 | 0.105 | 0.119 |
| continent: Americas | 0.133 | 0.011 | 12.519 | 0 | 0.112 | 0.154 |
| continent: Asia | 0.110 | 0.009 | 12.037 | 0 | 0.092 | 0.128 |
| continent: Europe | 0.166 | 0.012 | 14.357 | 0 | 0.143 | 0.189 |
| continent: Oceania | 0.152 | 0.029 | 5.187 | 0 | 0.095 | 0.210 |
Exercise 2: Suppose we decide we want to use the
natural log of family income exclusive of wife’s income,
inc, resulting in the following model
\[\begin{equation} lfp = \beta_0 + \beta_1k5 + \beta_2age + \beta_3age^2 + \beta_4wc + \beta_5lwg + \beta_6ln(inc) + \epsilon \end{equation}\]
Run this regression model and obtain the results.