23 R Evaluations

23.1 Learning Outcomes

In this chapter, you will learn how to:

  • Conduct a chi-square test
  • Conduct an independent t-test

23.2 Set-up

To complete this chapter, you need to

  • Start a R Markdown document
  • Change the YAML to at least the following. Feel free to add arguments.
---
title: 'R Chapter 23'
author: 'Your name'
output: 
  html_document:
    theme: spacelab
    df_print: paged
---
  • Load the following packages
library(tidyverse)
library(carData)
library(MASS)

For the chi-square test, we will use the MplsStops dataset within the carData package. For the t-tests, we will use the UScrime dataset within the MASS package. Be sure to view the documentation for these data in the Help tab of the bottom-right pane by typing the name of the dataset in the search bar.

23.3 Chi-square test

A chi-square test, like the one demonstrated in Chapter 11, requires two steps:

  • Create a cross-tabulation table using the table function
  • Run the chi-square on the cross-tabulation using the chisq.test function

23.3.1 Cross-tab

Below is the code used to produce the cross-tab from Chapter 11. I save the new table as polltable. Using the table function, I tell R which two variables from the poll dataset to cross-tabulate. The $ is how we identify a specific variable within a dataset. The levels of the first variable, response, will be tabulated by row, while the frequency of the levels of the second variable, party, will be tabulated by column.

polltable <- table(poll$response, poll$party)
Table 23.1: Response by political party
Republican Democrat Independent
Apply for citizenship 57 101 120
Guest worker 121 28 113
Leave the country 179 45 126

23.3.2 Run chi-square

Now that we have a cross-tabulation table, we can run the chi-square test. The code below demonstrates how.

chisq.test(immigration_poll)

    Pearson's Chi-squared test

data:  immigration_poll
X-squared = 100.95, df = 4, p-value < 0.00000000000000022

Then, it is simply a matter of interpreting the results.

Exercise 1: Using the MplsStops data, suppose we wanted to test whether receiving a citation after being stopped by the police, citationIssued, is independent of race. Both are nominal variables, so a chi-square test can be used. Run this chi-square test.

Exercise 2: Are the two variables independent? Why?

23.4 T-tests

To reiterate, if the two groups in a t-test are comprised of different subjects, we use an independent t-test. If they are comprised of the same subjects, then we use a dependent t-test.

23.4.1 Independent t-test

The code below demonstrates how the independent t-test from Chapter 11 was conducted. The t.test function works a lot like the lm function in that the outcome is entered first, then we input the variable that identifies the groups, which is essentially an explanatory variable. The two variables are separated by ~. Then, we tell R which dataset to use, which is called jobtrain in this case.

t.test(earnings ~ treatment, data = jobtrain)

    Welch Two Sample t-test

data:  earnings by treatment
t = -1.1921, df = 275.58, p-value = 0.2342
alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
95 percent confidence interval:
 -11629.708   2856.939
sample estimates:
mean in group 0 mean in group 1 
       21645.10        26031.49 

Exercise 3: Using the UScrimes data, suppose we wanted to test whether the probability of imprisonment, Prob, is independent of between Southern and non-Southern states, So. The outcome is numerical and the explanatory is nominal. Therefore, a t-test can be used. Run this t-test.

Exercise 4: Is there an association between the two variables? Why?

23.4.2 Dependent t-test

To conduct a dependent t-test, add the option paired=TRUE inside the t.test code like so

t.test(earnings ~ treatment, data = jobtrain, paired = TRUE)

However, this code will not work because the number of observations in the treatment and control groups are not equal. If we truly had a paired sample where the same subjects measured twice, then we should have the same number of observations in both groups.

23.5 Save and Upload

Knit your Rmd to save it and check for errors. If you are satisfied with your work, upload to eLC. Once you upload, answers will become available for download.