Note
This page was generated from a Jupyter notebook.
Temporal Rolling CV: Splits Visualization
This notebook aims to show how TemporalRollingCV
works. We start by importing the necessary packages and creating a small utility fuction to plot the splits
[1]:
import matplotlib.pyplot as plt
import numpy as np
from elliptic_toolkit import TemporalRollingCV, temporal_split
[2]:
# utility function to visualize the splits
def show_splits(cv, train_val_times, test_times):
n_folds = cv.get_n_splits() + 1
plt.figure(figsize=(10, 6))
for (fold, (train_indices, val_indices)) in enumerate(cv.split(train_val_times, groups=train_val_times)):
train_times = np.unique(train_val_times[train_indices])
val_times = np.unique(train_val_times[val_indices])
plt.scatter(train_times, [fold+1]*len(train_times), color='blue', label='Train' if fold==0 else None, marker='o', s=100)
plt.scatter(val_times, [fold+1]*len(val_times), color='orange', label='Validation' if fold==0 else None, marker='s', s=100)
plt.scatter(test_times, [fold+1]*len(test_times), color='red', label='Test' if fold==0 else None, marker='^', s=100)
folds = range(1, n_folds)
plt.xlabel('Time Step')
plt.ylabel('Fold')
plt.title('Train and Validation Time Steps per Fold')
plt.legend()
plt.yticks(folds, [f'Fold {i}' for i in folds])
plt.tight_layout()
We then create an array that should mimic a time index an that has more then one sample for each time step. We then hold out 20% of the unique time steps, those would be the one we would usually use for the final testing sample
[3]:
times = np.sort(np.random.randint(0, 10, size=30))
train_val_times, test_times = temporal_split(times, test_size=0.2)
Basic TemporalRollingCV
Standard temporal cross-validation with 5 folds. Each fold uses all previous time steps for training and the next available time step for validation. The number of time steps used for training and validation is automatically computed.
[4]:
show_splits(TemporalRollingCV(n_splits=5), train_val_times, test_times)
plt.show()
TemporalRollingCV with Gap
Adds a 2-time-step gap between training and validation sets.
[5]:
show_splits(TemporalRollingCV(n_splits=5, gap=2), train_val_times, test_times)
plt.show()
TemporalRollingCV with Limited Training Window
Limits training data to a maximum of 4 time steps, creating a sliding window approach that maintains consistent training set sizes.
[6]:
show_splits(TemporalRollingCV(n_splits=5, max_train_size=4), train_val_times, test_times)
plt.show()
TemporalRollingCV with Fixed Number of Time Steps for Validation
Fix the number of time steps to use for validation
[7]:
show_splits(TemporalRollingCV(n_splits=5, test_size=2), train_val_times, test_times)
plt.show()
[ ]: