PADP 7120 Data Applications in PA

class: center, middle, inverse, title-slide

.title[
# PADP 7120 Data Applications in PA
]
.subtitle[
## Hypothesis Testing
]
.author[
### Alex Combs
]
.institute[
### UGA | SPIA | PADP
]
.date[
### Last updated: April 09, 2025
]

---

# Outline

- Hypothesis testing
- p-values
- Null distribution
- Reporting statistically significant results
- Practical significance

---
# Regression table

`$$HealthExpenditure = \beta_0 + \beta_1TreatmentLocality + \epsilon$$`

|term           | estimate| std_error| statistic| p_value| lower_ci| upper_ci|
|:--------------|--------:|---------:|---------:|-------:|--------:|--------:|
|intercept      |    20.06|         0|   123.322|       0|    19.75|    20.38|
|treatment: Yes |    -6.41|         0|   -27.850|       0|    -6.86|    -5.96|

- Our goal is to better understand the `statistic` (less important) and `p_value` columns.

---
class: inverse, center, middle

# What is a hypothesis test & p-value?

---
# Hypotheses and p-values

**Hypothesis test** asks the following question: "Is my result so unlikely that I can conclude with a sufficient level of confidence that there is evidence in support of my hypothesis?"

**p-value:** the probability of obtaining my result or more extreme in a world where the **null hypothesis**, which states a variable has no relationship (effect) with (on) the outcome, is actually true.

- The p-value enables us to answer the hypothesis test

---
# Structure of hypothesis test

- Alternative hypothesis `$H_A$`

- Claims there is evidence for the phenomenon you are interested in testing.
  
  - For example: `$\beta_k \neq 0$`

- Null hypothesis `$H_0$`

- Claims there is no relationship or effect; the opposite of the alternative
  
  - For example: `$\beta_k = 0$`

---
# Example

> **Run all code chunks down to and including `promo-estimates`**

---
# Example

- Descriptive:

- For our one sample, we have calculated the proportion of males promoted `$p_M=0.88$` and proportion of females promoted `$p_F=0.58$`
  - In this sample, the difference in proportion promoted is `$p_M-p_F= 0.88-0.58=0.3$` or 30 percentage points.

- Inference:

- This is a sample of an unobserved population
  - And/or the counterfactual is unobservable
  - This sample provides us estimates of `$P_M$`, `$P_F$`, and `$P_M-P_F$`

---
# Example

- Following this case of promotions by gender, the difference in rates of promotion between males and females, `$P_M-P_F$`, is either 0 or something other than 0

- What are our null and alternative hypotheses for this analysis?

- `$H_A$`:  
  - `$H_0$`:

- What are the possible conclusions of this and all hypothesis tests?

---
# Example

1. If results reject the null, then

- We found statistically significant evidence that `$P_M-P_F \neq 0$`
  - Means the result of 0.3 is too improbable to be only due to random chance

2. If results fail to reject the null
  - We did not find statistically significant evidence that `$P_M-P_F \neq 0$`
  - Means result of 0.3 is not improbable enough to rule out random chance
  - `$P_M-P_F\lesseqqgtr0$` ; we don't know which
  
---
# Do Not Accept the Null

- Either reject or fail to reject the null

- Cannot accept the null

- That would be equivalent to concluding that males and females *are* promoted equally `$P_M-P_F = 0$`

- A hypothesis test (p-value) will almost never tell us an estimate *equals* 0 because of confidence intervals

- "Absence of evidence is not evidence of absence."

---
# What If We're Wrong

- Suppose our estimate of `$P_M-P_F=?$`, `$p_M-p_F=0.3$`, is so unlikely that we reject the null based on p-value

- Conclude that males and females are not promoted in equal proportions `$P_M-P_F \neq 0$`.

- Or, we obtain a high p-value, so we fail to reject the null, and can't conclude there is a difference

- But 0.3 is an *estimate* around which we construct a range of plausible values we assume captures the true parameter

- But our confidence interval could be 1 of the 5 out of 100 (assuming 95% CI) expected to fail

---
# What If We're Wrong

- If we **reject the null** because our 95% CI does **not** contain 0 (p-value < 0.05) but are wrong

- Type I error
  - False positive - concluding there is evidence when there truly is not

- If we **fail to reject** because our 95% CI does contain 0 (p-value `$\geq$` 0) but are wrong

- Type II error
  - False negative - concluding no evidence when there truly is

---
# Decision rule

- Common to choose a **significance level** of 5%, which is the same as choosing a **confidence level** of 95%.

- Sometimes denoted `$\alpha = 0.05$` where `$\alpha$` is significance level

- If p-value is less than 0.05, we **reject** the null. If it is greater than or equal to 0.05, we **fail to reject** the null.

- If p-value `$<\alpha$`, reject `$H_0$`

- If p-value `$\geq \alpha$`, fail to reject `$H_0$`

---
class: inverse, center, middle

# How do we know the probability of our result in a world where the null is true without knowing which world we live in?

---
# Null distribution

- We assume a null distribution, similar to the sampling distribution

- Null distribution is centered at 0 as if null were actually true

- Assuming null distribution is normal, we can then calculate the probability of our result

---
# Example

- If the null were true, then promotions would be random with respect to gender

- On average, `$P_M-P_F = 0$`
  - Promotions and gender would share no correlation

- Let's use R to simulate a world where the null is true.

> **Run all code chunks from `one-shuffle` down to and including `null-estimate`**

- Our null estimate is the difference in promotion rates between males and females if promotions were truly random with respect to gender based on one sample

- This simulates a world where the null is true

---
# Example

- Let's repeat this random shuffling 1,000 times, calculating the difference in promotions between males and females each time.

- Then we could plot the 1,000 values as a histogram, giving us a distribution of differences between males and females based on 1,000 samples in a world where promotions are random with respect to gender

> **Run `null-distribution` code chunk**

---
# Null distribution

- The **LLN** tells us the center of the null distribution will settle around 0, and the **CLT** tells us that the null distribution will be normal just like the sampling distribution.

- Therefore, we can calculate the percent of values expected to fall outside some chosen number of standard errors by applying the **68-95-99 rule** to the null distribution.

> **Run `null-center`, `null-se`, and `null-ci` code chunks.**

---
# Example p-value

- Our observed difference was 0.29

- How likely is this result if the null were actually true?

> **Run `p-value-viz` and `p-value` code chunks**

---
# Chi-square test

- A typical analysis does not simulate a null hypothesis to obtain the p-value

- Instead, statistical tests that use standard theoretical formulas are used

- We have two nominal variables to test `$\rightarrow$` Chi-square test

> **Run the `cross-tab` and `chi-square` code chunks**

- Does our conclusion change?

---
# Using an LPM

- We *could* use regression (LPM) for this

> **Run the `lpm` code chunk**

- Does our conclusion change?

- Note the estimate is the same but p-value differs. This case pushes the limits of when a regression is appropriate to use.

- But if we wanted to control for additional variables, such as age or years employed, we are back to needing a regression model.

---
# Back to regression table

|term           | estimate| std_error| statistic| p_value| lower_ci| upper_ci|
|:--------------|--------:|---------:|---------:|-------:|--------:|--------:|
|intercept      |   20.064|     0.163|   123.322|       0|   19.745|   20.383|
|treatment: Yes |   -6.406|     0.230|   -27.850|       0|   -6.857|   -5.955|

- What are the null and alternative hypotheses? What is the result of the test?

- How many standard errors is our estimate from the center (0) of the null distribution?

- What is the probability of obtaining our estimates or more extreme in a world where the null is true (treatment = 0)?

---
# Back to regression table

|term           | estimate| std_error| statistic| p_value| lower_ci| upper_ci|
|:--------------|--------:|---------:|---------:|-------:|--------:|--------:|
|intercept      |    20.06|      0.16|    123.32|       0|    19.75|    20.38|
|treatment: Yes |    -6.41|      0.23|    -27.85|       0|    -6.86|    -5.96|

There is statistically significant evidence that the health insurance subsidy reduced out-of-pocket expenditures on health care. On average, expenditures among treated households declined about $6 dollars per person.

---
# Another example

`$$Pr(Treatment=1) = \beta_0 + \beta_1Age + \beta_2Educ + \beta_3DirtFloor + \beta_4Bathroom + beta_5HospDist + \epsilon$$`

|term              | estimate| std_error| statistic| p_value| lower_ci| upper_ci|
|:-----------------|--------:|---------:|---------:|-------:|--------:|--------:|
|intercept         |    0.477|      0.03|    18.486|   0.000|    0.427|    0.528|
|age_hh            |   -0.001|      0.00|    -2.024|   0.043|   -0.002|    0.000|
|educ_hh           |    0.000|      0.00|     0.134|   0.894|   -0.004|    0.004|
|dirtfloor         |    0.023|      0.01|     2.179|   0.029|    0.002|    0.044|
|bathroom          |   -0.004|      0.01|    -0.418|   0.676|   -0.025|    0.016|
|hospital_distance |    0.000|      0.00|     3.676|   0.000|    0.000|    0.001|

---
class: inverse, center, middle

# Practical significance

---
# Practical significance steps

1. What is the typical change in the explanatory variable of interest?

2. Is the predicted change in the outcome due to a typical change in the explanatory variable meaningful?

3. Do the bounds of the confidence interval for the explanatory variable potentially change the answer to step 2?

- What is considered a meaningful change is somewhat subjective.

---

# Practical significance from PS2

|                            | Overall (N=9914) |
|:---------------------------|:----------------:|
|**health_expenditures**     |                  |
|&nbsp;&nbsp;&nbsp;Mean (SD) |     17 (12)      |
|&nbsp;&nbsp;&nbsp;Range     |     0 - 117      |
|**treatment**               |                  |
|&nbsp;&nbsp;&nbsp;Mean (SD) |      1 (1)       |
|&nbsp;&nbsp;&nbsp;Range     |      0 - 1       |

|term           | estimate| std_error| statistic| p_value| lower_ci| upper_ci|
|:--------------|--------:|---------:|---------:|-------:|--------:|--------:|
|intercept      |   20.064|     0.163|   123.322|       0|   19.745|   20.383|
|treatment: Yes |   -6.406|     0.230|   -27.850|       0|   -6.857|   -5.955|