elliptic_toolkit.temporal_cv module
- class elliptic_toolkit.temporal_cv.TemporalRollingCV(n_splits=5, *, test_size=None, max_train_size=None, gap=0, time_col='time')[source]
Bases:
TimeSeriesSplit
Time-based cross-validation iterator that extends scikit-learn’s TimeSeriesSplit to work with data that has explicit time step values (like the Elliptic Bitcoin dataset).
This class inherits from TimeSeriesSplit and adds functionality to handle datasets where multiple samples can belong to the same time step. It maps the time step indices to actual row indices in the dataset, allowing it to be used with datasets like the Elliptic Bitcoin dataset.
This CV strategy ensures that for each fold: 1. Training data comes from earlier time periods 2. The test set is a continuous time window following the training data 3. Each fold expands the training window and shifts the test window forward
Parameters:
- n_splitsint, default=5
Number of splits to generate
- test_sizeint, default=None
Size of test window in time steps. If None, will be calculated based on n_splits.
- max_train_sizeint, default=None
Maximum number of time steps to use for training. If None, all available time steps will be used.
- gapint, default=0
Number of time steps to skip between training and test sets
- time_colstr, default=’time’
Name of the column containing time step information
- split(X, y=None, groups=None)[source]
Generate indices to split data into training and test sets.
Unlike standard TimeSeriesSplit, this method works with explicit time step values and maps them to actual row indices in the dataset. This allows it to handle datasets where multiple samples can belong to the same time step.
Parameters:
- Xarray-like, DataFrame
Training data. If DataFrame, must contain the column specified by time_col. Otherwise, time values must be passed through the groups parameter.
- yarray-like, optional
Targets for the training data (ignored)
- groupsarray-like, optional
Time values for each sample if X doesn’t have the time column specified by time_col
Yields:
- train_indexndarray
Indices of rows in the training set
- test_indexndarray
Indices of rows in the test set
Notes:
The yielded indices refer to rows in the original dataset, not time steps. This makes the cross-validator compatible with scikit-learn’s model selection tools.