# Elliptic Bitcoin Dataset: Graph Convolutional Network Example

This notebook demonstrates how to use Graph Convolutional Networks (GCNs) with the Elliptic Bitcoin dataset. It covers:

- Loading and preparing the graph data
- Training a GCN model for binary classification
- Evaluating model performance
- Hyperparameter tuning with temporal cross-validation

This example uses PyTorch Geometric for graph neural network operations.

In [None]:
from elliptic_toolkit import download_dataset, process_dataset, temporal_split, TemporalRollingCV, GNNBinaryClassifier
from torch_geometric.data import Data
import torch

# Loading the Dataset

First, download the Elliptic Bitcoin dataset. This will automatically save the data files in the correct location for further processing.

In [None]:
download_dataset()

# Preparing Graph Data

Process the dataset to create a PyTorch Geometric `Data` object containing:

- Node features (transaction features)
- Edge indices (transaction connections)
- Node labels (illicit/licit classification)
- Time information for temporal splitting

We also create training and test splits based on temporal ordering, focusing only on labeled transactions.

In [None]:
nodes_df, edges_df = process_dataset()
data = Data(
 x=torch.tensor(nodes_df.drop(columns=['time', 'class']).values, dtype=torch.float),
 edge_index=torch.tensor(edges_df.values.T, dtype=torch.long),
 y=torch.tensor(nodes_df['class'].values, dtype=torch.long),
 time=torch.tensor(nodes_df['time'].values, dtype=torch.long)
)

train_val_idx, test_idx = temporal_split(data.time)

labeled_mask = data.y != -1
train_val_idx = train_val_idx[labeled_mask[train_val_idx]]
test_idx = test_idx[labeled_mask[test_idx]]



# Training a GCN Model

Create and train a Graph Convolutional Network using the `GNNBinaryClassifier` wrapper. 

**Note:** The model uses a low number of iterations (`max_iter=50`) for demonstration purposes, which may cause convergence warnings. In practice, you would use more iterations for better convergence.

In [None]:
from torch_geometric.nn import GCN

gcn_model = GNNBinaryClassifier(
 data,
 GCN,
 hidden_dim=8,
 num_layers=3,
 dropout=0.3,
 verbose=True,
 device='cpu',
 max_iter=50,
)

gcn_model.fit(train_val_idx)

# Model Evaluation

Evaluate the trained GCN model using a Precision-Recall curve on the test set. This provides insight into the model's ability to distinguish between illicit and licit transactions.

In [None]:
from sklearn.metrics import PrecisionRecallDisplay

PrecisionRecallDisplay.from_estimator(
 gcn_model,
 test_idx,
 data.y[test_idx],
 name="GCN Model",
)
from matplotlib import pyplot as plt
plt.show()

# Hyperparameter Tuning

Perform grid search to find optimal hyperparameters using temporal cross-validation. This ensures the model evaluation respects the temporal nature of the data.

The GCN model knows the full graph at training time and we only pass the indices over which we compute the loss. Note that we will have to pass the time steps in the `fit` method as `groups` in order to make them known to the cross validation splitter.

In [None]:
from sklearn.model_selection import GridSearchCV

gcn_model.set_params(verbose=False)

grid_search = GridSearchCV(
 gcn_model,
 param_grid={
 'hidden_dim': [2, 4, 8, 16],
 },
 cv=TemporalRollingCV(3),
 scoring='average_precision',
 n_jobs=-1,
 verbose=1,
)

grid_search.fit(train_val_idx, data.y[train_val_idx], groups=data.time[train_val_idx])

# Visualizing Results

Plot the marginal effects of hyperparameters and temporal evaluation results to understand model performance and parameter sensitivity.

In [None]:
from elliptic_toolkit import plot_marginals, plot_evals
for fig in plot_marginals(grid_search.cv_results_):
 plt.show()

In [None]:
for fig in plot_evals(grid_search, test_idx, data.y[test_idx].numpy(), data.y[train_val_idx].numpy(), time_steps_test=data.time[test_idx].numpy()):
 plt.show()