Temporal Rolling CV: Splits Visualization

This notebook aims to show how TemporalRollingCV works. We start by importing the necessary packages and creating a small utility fuction to plot the splits

[1]:

import matplotlib.pyplot as plt
import numpy as np
from elliptic_toolkit import TemporalRollingCV, temporal_split

[2]:

# utility function to visualize the splits
def show_splits(cv, train_val_times, test_times):
    n_folds = cv.get_n_splits() + 1
    plt.figure(figsize=(10, 6))

    for (fold, (train_indices, val_indices)) in enumerate(cv.split(train_val_times, groups=train_val_times)):
        train_times = np.unique(train_val_times[train_indices])
        val_times = np.unique(train_val_times[val_indices])
        plt.scatter(train_times, [fold+1]*len(train_times), color='blue', label='Train' if fold==0 else None, marker='o', s=100)
        plt.scatter(val_times, [fold+1]*len(val_times), color='orange', label='Validation' if fold==0 else None, marker='s', s=100)
        plt.scatter(test_times, [fold+1]*len(test_times), color='red', label='Test' if fold==0 else None, marker='^', s=100)

    folds = range(1, n_folds)
    plt.xlabel('Time Step')
    plt.ylabel('Fold')
    plt.title('Train and Validation Time Steps per Fold')
    plt.legend()
    plt.yticks(folds, [f'Fold {i}' for i in folds])
    plt.tight_layout()

We then create an array that should mimic a time index an that has more then one sample for each time step. We then hold out 20% of the unique time steps, those would be the one we would usually use for the final testing sample

[3]:

times = np.sort(np.random.randint(0, 10, size=30))
train_val_times, test_times = temporal_split(times, test_size=0.2)

Basic TemporalRollingCV

Standard temporal cross-validation with 5 folds. Each fold uses all previous time steps for training and the next available time step for validation. The number of time steps used for training and validation is automatically computed.

[4]:

show_splits(TemporalRollingCV(n_splits=5), train_val_times, test_times)
plt.show()

../_images/examples_temporal_rolling_cv_6_0.svg

TemporalRollingCV with Gap

Adds a 2-time-step gap between training and validation sets.

[5]:

show_splits(TemporalRollingCV(n_splits=5, gap=2), train_val_times, test_times)
plt.show()

../_images/examples_temporal_rolling_cv_8_0.svg

TemporalRollingCV with Limited Training Window

Limits training data to a maximum of 4 time steps, creating a sliding window approach that maintains consistent training set sizes.

[6]:

show_splits(TemporalRollingCV(n_splits=5, max_train_size=4), train_val_times, test_times)
plt.show()

../_images/examples_temporal_rolling_cv_10_0.svg

TemporalRollingCV with Fixed Number of Time Steps for Validation

Fix the number of time steps to use for validation

[7]:

show_splits(TemporalRollingCV(n_splits=5, test_size=2), train_val_times, test_times)
plt.show()

../_images/examples_temporal_rolling_cv_12_0.svg

[ ]: