elliptic_toolkit.temporal_cv module

class elliptic_toolkit.temporal_cv.TemporalRollingCV(n_splits=5, *, test_size=None, max_train_size=None, gap=0, time_col='time')[source]

Bases: TimeSeriesSplit

Time-based cross-validation iterator that extends scikit-learn’s TimeSeriesSplit to work with data that has explicit time step values (like the Elliptic Bitcoin dataset).

This class inherits from TimeSeriesSplit and adds functionality to handle datasets where multiple samples can belong to the same time step. It maps the time step indices to actual row indices in the dataset, allowing it to be used with datasets like the Elliptic Bitcoin dataset.

This CV strategy ensures that for each fold: 1. Training data comes from earlier time periods 2. The test set is a continuous time window following the training data 3. Each fold expands the training window and shifts the test window forward

Parameters:

n_splitsint, default=5: Number of splits to generate
test_sizeint, default=None: Size of test window in time steps. If None, will be calculated based on n_splits.
max_train_sizeint, default=None: Maximum number of time steps to use for training. If None, all available time steps will be used.
gapint, default=0: Number of time steps to skip between training and test sets
time_colstr, default=’time’: Name of the column containing time step information

__init__(n_splits=5, *, test_size=None, max_train_size=None, gap=0, time_col='time')[source]

split(X, y=None, groups=None)[source]

Generate indices to split data into training and test sets.

Unlike standard TimeSeriesSplit, this method works with explicit time step values and maps them to actual row indices in the dataset. This allows it to handle datasets where multiple samples can belong to the same time step.

Parameters:

Xarray-like, DataFrame: Training data. If DataFrame, must contain the column specified by time_col. Otherwise, time values must be passed through the groups parameter.
yarray-like, optional: Targets for the training data (ignored)
groupsarray-like, optional: Time values for each sample if X doesn’t have the time column specified by time_col

Yields:

train_indexndarray: Indices of rows in the training set
test_indexndarray: Indices of rows in the test set

Notes:

The yielded indices refer to rows in the original dataset, not time steps. This makes the cross-validator compatible with scikit-learn’s model selection tools.