Note
This page was generated from a Jupyter notebook.
Elliptic Bitcoin Dataset: Graph Convolutional Network Example
This notebook demonstrates how to use Graph Convolutional Networks (GCNs) with the Elliptic Bitcoin dataset. It covers:
Loading and preparing the graph data
Training a GCN model for binary classification
Evaluating model performance
Hyperparameter tuning with temporal cross-validation
This example uses PyTorch Geometric for graph neural network operations.
[1]:
from elliptic_toolkit import download_dataset, process_dataset, temporal_split, TemporalRollingCV, GNNBinaryClassifier
from torch_geometric.data import Data
import torch
Loading the Dataset
First, download the Elliptic Bitcoin dataset. This will automatically save the data files in the correct location for further processing.
[2]:
download_dataset()
Downloading https://data.pyg.org/datasets/elliptic/elliptic_txs_features.csv.zip
Downloading https://data.pyg.org/datasets/elliptic/elliptic_txs_edgelist.csv.zip
Downloading https://data.pyg.org/datasets/elliptic/elliptic_txs_classes.csv.zip
Preparing Graph Data
Process the dataset to create a PyTorch Geometric Data
object containing:
Node features (transaction features)
Edge indices (transaction connections)
Node labels (illicit/licit classification)
Time information for temporal splitting
We also create training and test splits based on temporal ordering, focusing only on labeled transactions.
[3]:
nodes_df, edges_df = process_dataset()
data = Data(
x=torch.tensor(nodes_df.drop(columns=['time', 'class']).values, dtype=torch.float),
edge_index=torch.tensor(edges_df.values.T, dtype=torch.long),
y=torch.tensor(nodes_df['class'].values, dtype=torch.long),
time=torch.tensor(nodes_df['time'].values, dtype=torch.long)
)
train_val_idx, test_idx = temporal_split(data.time)
labeled_mask = data.y != -1
train_val_idx = train_val_idx[labeled_mask[train_val_idx]]
test_idx = test_idx[labeled_mask[test_idx]]
Training a GCN Model
Create and train a Graph Convolutional Network using the GNNBinaryClassifier
wrapper.
Note: The model uses a low number of iterations (max_iter=50
) for demonstration purposes, which may cause convergence warnings. In practice, you would use more iterations for better convergence.
[4]:
from torch_geometric.nn import GCN
gcn_model = GNNBinaryClassifier(
data,
GCN,
hidden_dim=8,
num_layers=3,
dropout=0.3,
verbose=True,
device='cpu',
max_iter=50,
)
gcn_model.fit(train_val_idx)
Using device: cpu
Epoch 1: Loss = 1.263269
Epoch 2: Loss = 1.187056
Epoch 3: Loss = 1.144522
Epoch 4: Loss = 1.119214
Epoch 5: Loss = 1.088413
Epoch 6: Loss = 1.063648
Epoch 7: Loss = 1.048369
Epoch 8: Loss = 1.015733
Epoch 9: Loss = 1.011191
Epoch 10: Loss = 0.987076
Epoch 11: Loss = 0.970269
Epoch 12: Loss = 0.964926
Epoch 13: Loss = 0.959509
Epoch 14: Loss = 0.938973
Epoch 15: Loss = 0.933214
Epoch 16: Loss = 0.922311
Epoch 17: Loss = 0.907486
Epoch 18: Loss = 0.907295
Epoch 19: Loss = 0.898458
Epoch 20: Loss = 0.888472
Epoch 21: Loss = 0.884580
Epoch 22: Loss = 0.871965
Epoch 23: Loss = 0.864441
Epoch 24: Loss = 0.856292
Epoch 25: Loss = 0.843583
Epoch 26: Loss = 0.844607
Epoch 27: Loss = 0.833212
Epoch 28: Loss = 0.829192
Epoch 29: Loss = 0.829044
Epoch 30: Loss = 0.818572
Epoch 31: Loss = 0.800273
Epoch 32: Loss = 0.797820
Epoch 33: Loss = 0.784595
Epoch 34: Loss = 0.783999
Epoch 35: Loss = 0.780688
Epoch 36: Loss = 0.777762
Epoch 37: Loss = 0.779501
Epoch 38: Loss = 0.759684
Epoch 39: Loss = 0.758704
Epoch 40: Loss = 0.736232
Epoch 41: Loss = 0.732223
Epoch 42: Loss = 0.743988
Epoch 43: Loss = 0.735906
Epoch 44: Loss = 0.727239
Epoch 45: Loss = 0.711375
Epoch 46: Loss = 0.716075
Epoch 47: Loss = 0.704705
Epoch 48: Loss = 0.717037
Epoch 49: Loss = 0.708850
Epoch 50: Loss = 0.700287
/home/runner/work/EllipticBitcoinToolkit/EllipticBitcoinToolkit/elliptic_toolkit/model_wrappers.py:357: UserWarning: Training stopped before reaching convergence. Consider increasing max_iter (currently 50) or decreasing tol (currently 0.0001) for better results.
warnings.warn(
[4]:
GNNBinaryClassifier(data=Data(x=[203769, 165], edge_index=[2, 234355], y=[203769], time=[203769]), device=device(type='cpu'), dropout=0.3, hidden_dim=8, max_iter=50, model=<class 'torch_geometric.nn.models.basic_gnn.GCN'>, verbose=True)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Parameters
data | Data(x=[20376...time=[203769]) | |
model | <class 'torch...asic_gnn.GCN'> | |
hidden_dim | 8 | |
num_layers | 3 | |
dropout | 0.3 | |
norm | None | |
jk | 'last' | |
learning_rate_init | 0.01 | |
weight_decay | 0.0005 | |
balance_loss | True | |
max_iter | 50 | |
verbose | True | |
n_iter_no_change | 10 | |
tol | 0.0001 | |
device | device(type='cpu') | |
heads | None |
Model Evaluation
Evaluate the trained GCN model using a Precision-Recall curve on the test set. This provides insight into the model’s ability to distinguish between illicit and licit transactions.
[5]:
from sklearn.metrics import PrecisionRecallDisplay
PrecisionRecallDisplay.from_estimator(
gcn_model,
test_idx,
data.y[test_idx],
name="GCN Model",
)
from matplotlib import pyplot as plt
plt.show()
Hyperparameter Tuning
Perform grid search to find optimal hyperparameters using temporal cross-validation. This ensures the model evaluation respects the temporal nature of the data.
The GCN model knows the full graph at training time and we only pass the indices over which we compute the loss. Note that we will have to pass the time steps in the fit
method as groups
in order to make them known to the cross validation splitter.
[6]:
from sklearn.model_selection import GridSearchCV
gcn_model.set_params(verbose=False)
grid_search = GridSearchCV(
gcn_model,
param_grid={
'hidden_dim': [2, 4, 8, 16],
},
cv=TemporalRollingCV(3),
scoring='average_precision',
n_jobs=-1,
verbose=1,
)
grid_search.fit(train_val_idx, data.y[train_val_idx], groups=data.time[train_val_idx])
Fitting 3 folds for each of 4 candidates, totalling 12 fits
/home/runner/work/EllipticBitcoinToolkit/EllipticBitcoinToolkit/elliptic_toolkit/model_wrappers.py:357: UserWarning: Training stopped before reaching convergence. Consider increasing max_iter (currently 50) or decreasing tol (currently 0.0001) for better results.
warnings.warn(
/home/runner/work/EllipticBitcoinToolkit/EllipticBitcoinToolkit/elliptic_toolkit/model_wrappers.py:357: UserWarning: Training stopped before reaching convergence. Consider increasing max_iter (currently 50) or decreasing tol (currently 0.0001) for better results.
warnings.warn(
/home/runner/work/EllipticBitcoinToolkit/EllipticBitcoinToolkit/elliptic_toolkit/model_wrappers.py:357: UserWarning: Training stopped before reaching convergence. Consider increasing max_iter (currently 50) or decreasing tol (currently 0.0001) for better results.
warnings.warn(
/home/runner/work/EllipticBitcoinToolkit/EllipticBitcoinToolkit/elliptic_toolkit/model_wrappers.py:357: UserWarning: Training stopped before reaching convergence. Consider increasing max_iter (currently 50) or decreasing tol (currently 0.0001) for better results.
warnings.warn(
/home/runner/work/EllipticBitcoinToolkit/EllipticBitcoinToolkit/elliptic_toolkit/model_wrappers.py:357: UserWarning: Training stopped before reaching convergence. Consider increasing max_iter (currently 50) or decreasing tol (currently 0.0001) for better results.
warnings.warn(
/home/runner/work/EllipticBitcoinToolkit/EllipticBitcoinToolkit/elliptic_toolkit/model_wrappers.py:357: UserWarning: Training stopped before reaching convergence. Consider increasing max_iter (currently 50) or decreasing tol (currently 0.0001) for better results.
warnings.warn(
/home/runner/work/EllipticBitcoinToolkit/EllipticBitcoinToolkit/elliptic_toolkit/model_wrappers.py:357: UserWarning: Training stopped before reaching convergence. Consider increasing max_iter (currently 50) or decreasing tol (currently 0.0001) for better results.
warnings.warn(
/home/runner/work/EllipticBitcoinToolkit/EllipticBitcoinToolkit/elliptic_toolkit/model_wrappers.py:357: UserWarning: Training stopped before reaching convergence. Consider increasing max_iter (currently 50) or decreasing tol (currently 0.0001) for better results.
warnings.warn(
/home/runner/work/EllipticBitcoinToolkit/EllipticBitcoinToolkit/elliptic_toolkit/model_wrappers.py:357: UserWarning: Training stopped before reaching convergence. Consider increasing max_iter (currently 50) or decreasing tol (currently 0.0001) for better results.
warnings.warn(
/home/runner/work/EllipticBitcoinToolkit/EllipticBitcoinToolkit/elliptic_toolkit/model_wrappers.py:357: UserWarning: Training stopped before reaching convergence. Consider increasing max_iter (currently 50) or decreasing tol (currently 0.0001) for better results.
warnings.warn(
/home/runner/work/EllipticBitcoinToolkit/EllipticBitcoinToolkit/elliptic_toolkit/model_wrappers.py:357: UserWarning: Training stopped before reaching convergence. Consider increasing max_iter (currently 50) or decreasing tol (currently 0.0001) for better results.
warnings.warn(
/home/runner/work/EllipticBitcoinToolkit/EllipticBitcoinToolkit/elliptic_toolkit/model_wrappers.py:357: UserWarning: Training stopped before reaching convergence. Consider increasing max_iter (currently 50) or decreasing tol (currently 0.0001) for better results.
warnings.warn(
/home/runner/work/EllipticBitcoinToolkit/EllipticBitcoinToolkit/elliptic_toolkit/model_wrappers.py:357: UserWarning: Training stopped before reaching convergence. Consider increasing max_iter (currently 50) or decreasing tol (currently 0.0001) for better results.
warnings.warn(
[6]:
GridSearchCV(cv=TemporalRollingCV(gap=0, max_train_size=None, n_splits=3, test_size=None, time_col='time'), estimator=GNNBinaryClassifier(data=Data(x=[203769, 165], edge_index=[2, 234355], y=[203769], time=[203769]), device=device(type='cpu'), dropout=0.3, hidden_dim=8, max_iter=50, model=<class 'torch_geometric.nn.models.basic_gnn.GCN'>), n_jobs=-1, param_grid={'hidden_dim': [2, 4, 8, 16]}, scoring='average_precision', verbose=1)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Parameters
estimator | GNNBinaryClas...sic_gnn.GCN'>) | |
param_grid | {'hidden_dim': [2, 4, ...]} | |
scoring | 'average_precision' | |
n_jobs | -1 | |
refit | True | |
cv | TemporalRolli...me_col='time') | |
verbose | 1 | |
pre_dispatch | '2*n_jobs' | |
error_score | nan | |
return_train_score | False |
GNNBinaryClassifier(data=Data(x=[203769, 165], edge_index=[2, 234355], y=[203769], time=[203769]), device=device(type='cpu'), dropout=0.3, hidden_dim=16, max_iter=50, model=<class 'torch_geometric.nn.models.basic_gnn.GCN'>)
Parameters
data | Data(x=[20376...time=[203769]) | |
model | <class 'torch...asic_gnn.GCN'> | |
hidden_dim | 16 | |
num_layers | 3 | |
dropout | 0.3 | |
norm | None | |
jk | 'last' | |
learning_rate_init | 0.01 | |
weight_decay | 0.0005 | |
balance_loss | True | |
max_iter | 50 | |
verbose | False | |
n_iter_no_change | 10 | |
tol | 0.0001 | |
device | device(type='cpu') | |
heads | None |
Visualizing Results
Plot the marginal effects of hyperparameters and temporal evaluation results to understand model performance and parameter sensitivity.
[7]:
from elliptic_toolkit import plot_marginals, plot_evals
for fig in plot_marginals(grid_search.cv_results_):
plt.show()
[8]:
for fig in plot_evals(grid_search, test_idx, data.y[test_idx].numpy(), data.y[train_val_idx].numpy(), time_steps_test=data.time[test_idx].numpy()):
plt.show()