23 R Evaluations
23.1 Learning Outcomes
In this chapter, you will learn how to:
- Conduct a chi-square test
- Conduct an independent t-test
23.2 Set-up
To complete this chapter, you need to
- Start a R Markdown document
- Change the YAML to at least the following. Feel free to add arguments.
---
title: 'R Chapter 23'
author: 'Your name'
output:
html_document:
theme: spacelab
df_print: paged
---
- Load the following packages
For the chi-square test, we will use the MplsStops
dataset within the carData
package. For the t-tests, we will use the UScrime
dataset within the MASS
package. Be sure to view the documentation for these data in the Help tab of the bottom-right pane by typing the name of the dataset in the search bar.
23.3 Chi-square test
A chi-square test, like the one demonstrated in Chapter 11, requires two steps:
- Create a cross-tabulation table using the
table
function - Run the chi-square on the cross-tabulation using the
chisq.test
function
23.3.1 Cross-tab
Below is the code used to produce the cross-tab from Chapter 11. I save the new table as polltable
. Using the table
function, I tell R which two variables from the poll
dataset to cross-tabulate. The $
is how we identify a specific variable within a dataset. The levels of the first variable, response
, will be tabulated by row, while the frequency of the levels of the second variable, party
, will be tabulated by column.
Republican | Democrat | Independent | |
---|---|---|---|
Apply for citizenship | 57 | 101 | 120 |
Guest worker | 121 | 28 | 113 |
Leave the country | 179 | 45 | 126 |
23.3.2 Run chi-square
Now that we have a cross-tabulation table, we can run the chi-square test. The code below demonstrates how.
Pearson's Chi-squared test
data: immigration_poll
X-squared = 100.95, df = 4, p-value < 0.00000000000000022
Then, it is simply a matter of interpreting the results.
Exercise 1: Using the MplsStops
data,
suppose we wanted to test whether receiving a citation after being
stopped by the police, citationIssued
, is independent of
race.
Both are nominal variables, so a chi-square test can
be used. Run this chi-square test.
Exercise 2: Are the two variables independent? Why?
23.4 T-tests
To reiterate, if the two groups in a t-test are comprised of different subjects, we use an independent t-test. If they are comprised of the same subjects, then we use a dependent t-test.
23.4.1 Independent t-test
The code below demonstrates how the independent t-test from Chapter 11 was conducted. The t.test
function works a lot like the lm
function in that the outcome is entered first, then we input the variable that identifies the groups, which is essentially an explanatory variable. The two variables are separated by ~
. Then, we tell R which dataset to use, which is called jobtrain
in this case.
Welch Two Sample t-test
data: earnings by treatment
t = -1.1921, df = 275.58, p-value = 0.2342
alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
95 percent confidence interval:
-11629.708 2856.939
sample estimates:
mean in group 0 mean in group 1
21645.10 26031.49
Exercise 3: Using the UScrimes
data,
suppose we wanted to test whether the probability of imprisonment,
Prob
, is independent of between Southern and non-Southern
states, So
. The outcome is numerical and the explanatory is
nominal. Therefore, a t-test can be used. Run this t-test.
Exercise 4: Is there an association between the two variables? Why?
23.4.2 Dependent t-test
To conduct a dependent t-test, add the option paired=TRUE
inside the t.test
code like so
However, this code will not work because the number of observations in the treatment and control groups are not equal. If we truly had a paired sample where the same subjects measured twice, then we should have the same number of observations in both groups.