Other Utilities

Generating Full Span of multiple time-series

[1]:
import pandas as pd
import numpy as np
from orbit.utils.general import expand_grid, regenerate_base_df

import warnings
warnings.filterwarnings('ignore')

Define the series keys and datetime array.

[2]:
dt = pd.date_range('2020-01-31', '2022-12-31', freq='M')
keys = ['x' + str(x) for x in range(10)]
print(keys)
print(dt)
['x0', 'x1', 'x2', 'x3', 'x4', 'x5', 'x6', 'x7', 'x8', 'x9']
DatetimeIndex(['2020-01-31', '2020-02-29', '2020-03-31', '2020-04-30',
               '2020-05-31', '2020-06-30', '2020-07-31', '2020-08-31',
               '2020-09-30', '2020-10-31', '2020-11-30', '2020-12-31',
               '2021-01-31', '2021-02-28', '2021-03-31', '2021-04-30',
               '2021-05-31', '2021-06-30', '2021-07-31', '2021-08-31',
               '2021-09-30', '2021-10-31', '2021-11-30', '2021-12-31',
               '2022-01-31', '2022-02-28', '2022-03-31', '2022-04-30',
               '2022-05-31', '2022-06-30', '2022-07-31', '2022-08-31',
               '2022-09-30', '2022-10-31', '2022-11-30', '2022-12-31'],
              dtype='datetime64[ns]', freq='ME')

Users can use expand_grid to generate dataframe with observations in key and dt levels.

[3]:
df_base = expand_grid({
    'key': keys,
    'dt': dt,
})
x = np.random.normal(0, 1, 10 * 36)
df_base['x'] = x
print(df_base.shape)
df_base.head(5)
(360, 3)
[3]:
key dt x
0 x0 2020-01-31 0.357236
1 x0 2020-02-29 -1.172618
2 x0 2020-03-31 -0.852877
3 x0 2020-04-30 1.080290
4 x0 2020-05-31 -0.641044

Regenerate Multiple Timeseries with Missing rows

Create missing rows.

[4]:
np.random.seed(2022)
drop_idx = np.random.choice(df_base.index, 5, replace=False)
df_missing = df_base.drop(drop_idx).reset_index(drop=True)
print(df_missing.shape)
df_missing.head(5)
(355, 3)
[4]:
key dt x
0 x0 2020-01-31 0.357236
1 x0 2020-02-29 -1.172618
2 x0 2020-03-31 -0.852877
3 x0 2020-04-30 1.080290
4 x0 2020-05-31 -0.641044

Use regenerate_base_df to regenerate the base dataframe.

[5]:
time_col = "dt"
key_col = "key"
new_df_base = regenerate_base_df(df_missing, time_col, key_col, val_cols=['x'])

By default, the missing entries regenerated come with a null value.

[6]:
new_df_base.iloc[drop_idx]
[6]:
dt key x
286 2022-11-30 x7 NaN
274 2021-11-30 x7 NaN
75 2020-04-30 x2 NaN
135 2022-04-30 x3 NaN
43 2020-08-31 x1 NaN

Users can also use fill_na option to fill the missing values.

[7]:
new_df_base = regenerate_base_df(df_missing, time_col, key_col, val_cols=['x'], fill_na=0)
[8]:
new_df_base.iloc[drop_idx]
[8]:
dt key x
286 2022-11-30 x7 0.0
274 2021-11-30 x7 0.0
75 2020-04-30 x2 0.0
135 2022-04-30 x3 0.0
43 2020-08-31 x1 0.0