class: center, middle, inverse, title-slide .title[ # PADP 7120 Data Applications in PA ] .subtitle[ ## RLab 9: Forecasting ] .author[ ### Alex Combs ] .institute[ ### UGA | SPIA | PADP ] .date[ ### Last updated: April 19, 2024 ] --- # Set-up > **Start a new project and Rmd** > **Change YAML** ```r --- title: "RLab 9: Forecasting" author: "Your Name" output: html_document: theme: spacelab df_print: paged --- ``` --- # Packages > **In setup code chunk, load following packages and add the option that stops R from using scientific notation for large values** ```r library(tidyverse) library(forecast) library(fpp2) options(scipen = 999) ``` --- # Set-up > **Download `acc_select_tax_rev` from eLC and import. You can copy and paste the below code in your setup code chunk.** ```r acc_taxrev <- read_csv("acc_select_tax_rev.csv") ``` --- # Learning objectives - Convert a dataset to a time series dataset in R - Explore time series for patterns - Use simple forecasting models like naive, average, and trend - Use exponential smoothing forecast - Assess goodness-of-fit between forecast models --- # Steps of forecasting 1. Preliminary/exploratory analysis 2. Choosing and fitting models 3. Using and evaluating a forecasting model --- # Creating a time series (TS) object - First, need to tell R that data is time series - Generic syntax: ```r ts_name <- ts(data_set_name[,-1], start = year_of_first_observation, end = year_of_last_observation, frequency = number_of_observations_between_years) ``` - The `-1` in the bracket tells R which column in the data contains the time variable - In this case the time variable is the first column in the data --- # TS object examples - annual time series | year| revenue| |----:|-------:| | 2016| 45| | 2017| 67| | 2018| 61| | 2019| 40| | 2020| 50| ```r ts_data <- ts(data[,-1], start = 2016, end = 2020, frequency = 1) ``` --- # TS object examples - quarterly data (top & bottom rows shown) | year| quarter| revenue| |----:|-------:|-------:| | 2016| 1| 52| | 2016| 2| 67| | year| quarter| revenue| |----:|-------:|-------:| | 2020| 3| 54| | 2020| 4| 58| ```r ts_quarterly <- ts(quarterly[,-1], start = 2016, end = 2020, * frequency = 4) ``` --- # TS object examples - monthly data (top & bottom rows shown) | year| month| revenue| |----:|-----:|-------:| | 2016| 1| 47| | 2016| 2| 49| | year| month| revenue| |----:|-----:|-------:| | 2020| 11| 33| | 2020| 12| 42| ```r ts_monthly <- ts(monthly[,-1], start = 2016, end = 2020, * frequency = 12) ``` --- # TS object examples - biennial data (top & bottom rows shown) | year| revenue| |----:|-------:| | 2000| 47| | 2002| 56| | year| revenue| |----:|-------:| | 2018| 45| | 2020| 46| ```r ts_biennial <- ts(biennial[,-1], start = 2000, end = 2020, * frequency = 0.5) ``` --- # Frequencies |Data |frequency | |:-----------|:---------| |Quadrennial |0.25 | |Biennial |0.5 | |Annual |1 | |Quarterly |4 | |Monthly |12 | |Weekly |52 | |Daily |365 | --- # Creating TS object - `acc_taxrev` contains annual property tax and sales tax revenues for Athens-Clarke County from 1992-2022 > **In a new code chunk, set `acc_taxrev` as a TS object.** ```r ts_taxrev <- ts(acc_taxrev[,-1], start = 1992, end = 2022, frequency = 1) ``` --- # Exploratory plotting - The purpose of exploratory plotting is to detect **patterns** - Patterns provide information for prediction in the future - Certain forecast models are better for certain patterns --- # Types of patterns 1. Trend - A long-term increase or decrease in the variable 2. Seasonal - A periodic pattern that follows the calendar (e.g. quarter, month, day of the week) 3. Cyclic - Variable rises and falls not according to a fixed calendar (e.g. recessions) - Time series can exhibit any combination of the above patterns - Time series must be more frequent than annual to have seasonality --- # Patterns <img src="rlab10-forecasting_files/figure-html/unnamed-chunk-22-1.png" style="display: block; margin: auto;" /> - Trend with seasonality --- # Patterns <img src="labs_files/patterns-colfinance.png" style="display: block; margin: auto;" /> --- # Exploratory plots (line graphs) - If R knows data is time series, we can use `autoplot` to quickly generate some useful graphs. > **In a new code chunk, add and run each line of the following code** ```r autoplot(ts_taxrev) ``` ```r autoplot(ts_taxrev[,1]) ``` ```r autoplot(ts_taxrev[,2]) ``` --- class: inverse, center, middle # Choosing and fitting forecast models --- # Sources of information - There are two sources of information for making future predictions - Past values of the outcome - Variables that explain and/or predict the outcome -- - Models that use only past values of outcome - Simple models: naive, mean, trend - Exponential smoothing (common; we will cover this) - ARIMA (common; we will not cover) -- - Regression models that use past outcome and explanatory variables (not covered) - Useful for forecasts based on scenarios where explanatory variables change --- # Autocorrelation - Recall the correlation coefficient measures the linear association between two variables - Models using past values rely on **autocorrelation**: the linear association between a variable and past values of the *same* variable - If past values are correlated with future values, then the past informs the future - **Lag**: term used to refer to a past value - The 3rd lag for annual time series value in 2020 is 2017 --- # Autocorrelation <img src="rlab10-forecasting_files/figure-html/unnamed-chunk-25-1.png" style="display: block; margin: auto;" /> - Clear trend; we should expect presence of autocorrelation --- # Autocorrelation Plot - Can test for autocorrelation using `ggAcf()` ```r ggAcf(ts_taxrev[,1]) ``` <img src="rlab10-forecasting_files/figure-html/unnamed-chunk-26-1.png" style="display: block; margin: auto;" /> - Blue dashed line distinguishes statistically significant autocorrelation - Past property tax revenue significantly correlates with future --- # White noise - A time series that shows no autocorrelation is called **white noise** - The time series is random and past values do not correlate with future values - Athens property tax revenue does not exhibit white noise --- # Autocorrelation Plot > **Use `ggAcf` to generate an autocorrelation plot for sales_use_tax in the `ts_taxrev` dataset** --- # Simple Forecasting Models 1. Mean - future values predicted to equal average over time - No trend pattern - Cyclic or seasonal pattern around a stable average level -- 2. Naive - future values predicted to equal most recent value - No trend pattern; last period is best guess -- 3. Drift - draw line from first to last value and extrapolate - Strong trend from start to end -- 4. Seasonal Naive - same as naive but predicts each season as equal to its most recent season --- # Simple forecast models <img src="labs_files/forcastcompare1.png" style="display: block; margin: auto;" /> --- # Simple forecast models <img src="labs_files/forecastcompare2.png" style="display: block; margin: auto;" /> --- # Simple Forecasts in R - Generic syntax ```r new_object <- forecast_model(tsdata, h = periods_into_future) ``` - Replace `forecast_model` with: ```r meanf() naive() rwf(drift = TRUE) snaive() ``` - Default for `h`orizon is 10 periods - Corresponds to frequency of the data; h = 4 in quarterly data is one year --- # In R - Running `meanf`, `naive`, and `rwf` on property tax revenue in the `ts_taxrev` data to forecast the next 4 years ```r proprev_meanf <- meanf(ts_taxrev[,1], h = 4) proprev_naive <- naive(ts_taxrev[,1], h = 4) proprev_drift <- rwf(ts_taxrev[,1], drift = TRUE, h = 4) ``` - These new objects are like the regression results we have saved many times - We can use functions on these new objects to view our results --- # Viewing forecasts in R - Can use `autoplot` on the saved model to visualize forecast. ```r autoplot(proprev_meanf) ``` <img src="rlab10-forecasting_files/figure-html/unnamed-chunk-33-1.png" style="display: block; margin: auto;" /> - Mean forecast will be the worst model for strong trend data --- # Viewing forecasts in R ```r autoplot(proprev_naive) ``` <img src="rlab10-forecasting_files/figure-html/unnamed-chunk-34-1.png" style="display: block; margin: auto;" /> --- # Viewing forecasts in R ```r autoplot(proprev_drift) ``` <img src="rlab10-forecasting_files/figure-html/unnamed-chunk-35-1.png" style="display: block; margin: auto;" /> --- # Reporting plausible ranges - Can use `summary` on saved models to obtain ranges. ```r summary(proprev_drift) ``` | | Point Forecast| Lo 80| Hi 80| Lo 95| Hi 95| |:----|--------------:|--------:|--------:|--------:|--------:| |2023 | 76485236| 74356555| 78613917| 73229699| 79740773| |2024 | 78291986| 75233407| 81350565| 73614293| 82969679| |2025 | 80098736| 76294676| 83902796| 74280929| 85916543| |2026 | 81905486| 77446879| 86364093| 75086635| 88724337| --- # Practice > **In a new code chunk, use the drift forecast on sales tax for next 3 years** > **Produce the graph and the numerical results** --- class: inverse, center, middle # Exponential Smoothing --- # Exponential smoothing - Assigns a descending weight to each past value <img src="labs_files/expsmooth.png" style="display: block; margin: auto;" /> --- # Exponential smoothing - Can also incorporate trend, damped or linear <img src="labs_files/expsmooth-trend.png" style="display: block; margin: auto;" /> --- # Exponential smoothing - Can also incorporate seasonality that is additive (constant over time) or multiplicative (increases over time) <img src="labs_files/expsmooth-season.png" style="display: block; margin: auto;" /> --- # Exponential smoothing combinations - Error: `A`dditive, `M`ultiplicative - Trend: `N`one, `A`dditive, `A_d`amped - Seasonal: `N`one, `A`dditive, `M`ultiplicative - 18 possible combinations of exponential smoothing models --- # Exponential smoothing in R - The `ets` (error, trend, seasonal) function tests the 18 possible models and chooses the best one - Generic syntax ```r ets(ts_data) %>% forecast(h=forecast_horizon) %>% autoplot() OR summary() ``` --- # Exponential smoothing in R ```r ets(ts_taxrev[,1]) %>% forecast(h=4) %>% autoplot() ``` <img src="rlab10-forecasting_files/figure-html/unnamed-chunk-45-1.png" style="display: block; margin: auto;" /> - Chose `M`ultiplicative error, `Additive` trend, and `N`o seasonality --- # Exponential smoothing in R ```r ets(ts_taxrev[,1]) %>% forecast(h=4) %>% summary() ``` | | Point Forecast| Lo 80| Hi 80| Lo 95| Hi 95| |:----|--------------:|--------:|---------:|--------:|---------:| |2023 | 78804125| 75158599| 82449652| 73228774| 84379477| |2024 | 82929945| 76541329| 89318562| 73159399| 92700492| |2025 | 87055766| 77656647| 96454884| 72681055| 101430477| |2026 | 91181586| 78445166| 103918005| 71702913| 110660259| --- # Practice > **Use the `ets` function on sales tax for the next 3 years to produce a graph and table.** --- class: inverse, center, middle # Assessing goodness-of-fit --- # Goodness of fit - Can use `accuracy()` to compare the fit of competing models ```r accuracy(proprev_drift) ``` ``` ## ME RMSE MAE MPE MAPE MASE ACF1 ## Training set 0 1606544 1157909 -0.7642346 2.878888 0.5846304 0.4373541 ``` ```r ets(ts_taxrev[,1]) %>% accuracy() ``` ``` ## ME RMSE MAE MPE MAPE MASE ACF1 ## Training set 244500.5 1434558 1128638 0.4201145 2.85475 0.569851 0.07795879 ``` - ets model has a lower RMSE: better fit --- # Goodness of fit > **Use `accuracy` to compare drift and ets models for sales tax** - Which model fits the data better? How far off is it, on average? > **Upload your Rmd to eLC**