Quick Start

This session covers topics:

  • a forecast task on iclaims dataset

  • a simple Bayesian ETS Model using PyStan

  • posterior distribution extraction

  • tools to visualize the forecast

Load Library

%matplotlib inline
import matplotlib.pyplot as plt

import orbit
from orbit.utils.dataset import load_iclaims
from orbit.models import ETS
from orbit.diagnostics.plot import plot_predicted_data


The iclaims data contains the weekly initial claims for US unemployment (obtained from Federal Reserve Bank of St. Louis) benefits against a few related Google trend queries (unemploy, filling and job) from Jan 2010 - June 2018. This aims to demo a similar dataset from the Bayesian Structural Time Series (BSTS) model (Scott and Varian 2014).

Note that the numbers are log-log transformed for fitting purpose and the discussion of using the regressors can be found in later chapters with the Damped Local Trend (DLT) model.

# load data
df = load_iclaims()
date_col = 'week'
response_col = 'claims'
week              datetime64[ns]
claims                   float64
trend.unemploy           float64
trend.filling            float64
trend.job                float64
sp500                    float64
vix                      float64
dtype: object
week claims trend.unemploy trend.filling trend.job sp500 vix
0 2010-01-03 13.386595 0.219882 -0.318452 0.117500 -0.417633 0.122654
1 2010-01-10 13.624218 0.219882 -0.194838 0.168794 -0.425480 0.110445
2 2010-01-17 13.398741 0.236143 -0.292477 0.117500 -0.465229 0.532339
3 2010-01-24 13.137549 0.203353 -0.194838 0.106918 -0.481751 0.428645
4 2010-01-31 13.196760 0.134360 -0.242466 0.074483 -0.488929 0.487404

Train-test split.

test_size = 52
train_df = df[:-test_size]
test_df = df[-test_size:]

Forecasting Using Orbit

Orbit aims to provide an intuitive initialize-fit-predict interface for working with forecasting tasks. Under the hood, it utilizes probabilistic modeling API such as PyStan and Pyro. We first illustrate a Bayesian implementation of Rob Hyndman’s ETS (which stands for Error, Trend, and Seasonality) Model (Hyndman et. al, 2008) using PyStan.

ets = ETS(
INFO:orbit:Sampling (PyStan) with chains: 4, cores: 8, temperature: 1.000, warmups (per chain): 225 and samples(per chain): 25.
WARNING:pystan:Maximum (flat) parameter count (1000) exceeded: skipping diagnostic tests for n_eff and Rhat.
To run all diagnostics call pystan.check_hmc_diagnostics(fit)
CPU times: user 46.7 ms, sys: 45.2 ms, total: 91.9 ms
Wall time: 862 ms
<orbit.forecaster.full_bayes.FullBayesianForecaster at 0x17b6e8e50>
predicted_df = ets.predict(df=test_df)
_ = plot_predicted_data(train_df, predicted_df, date_col, response_col, title='Prediction with ETS')

Extract and Analyze Posterior Samples

Users can use .get_posterior_samples() to extract posterior samples in an OrderedDict format.

posterior_samples = ets.get_posterior_samples()
odict_keys(['l', 'lev_sm', 'obs_sigma', 's', 'sea_sm'])

The extracted parameters posteriors are pretty much compatible diagnostic with arviz. To do that, users can set permute=False to preserve chain information.

import arviz as az

posterior_samples = ets.get_posterior_samples(permute=False)

# example from https://arviz-devs.github.io/arviz/index.html
    var_names=["sea_sm", "lev_sm", "obs_sigma"],

For more details in model diagnostics visualization, there is a subsequent section dedicated to it.