Note
Go to the end to download the full example code or to run this example in your browser via JupyterLite or Binder.
Using PyTorch (via skorch) in DataOps#
This example shows how to wrap a PyTorch model with skorch and plug it into a skrub DataOps plan.
Note
This example requires the optional dependencies torch and skorch.
The main goal here is to show the integration pattern:
PyTorch defines the model (an
nn.Module)skorch wraps it as a scikit-learn compatible estimator
skrub DataOps builds a plan and can tune skorch (and therefore PyTorch) hyperparameters using the skrub choices.
Loading the data#
We use scikit-learn’s digits dataset because it is small and ships with scikit-learn. Each sample is an 8x8 grayscale image of a handwritten digit, encoded as 64 pixel intensity values and displays a number from 0 to 9.
from sklearn.datasets import load_digits
digits = load_digits()
X, y = digits.data, digits.target
print(f"Dataset shape: {X.shape}")
print(f"Number of classes: {len(set(y))}")
Dataset shape: (1797, 64)
Number of classes: 10
Start of the DataOps plan#
We start the DataOps plan by creating the skrub variables X and y.
Data preprocessing#
We start by normalizing the pixel values to [0, 1] by first computing the global max value and then dividing the pixel values by this max value. Importantly, we freeze the max value (scaling factor) after fitting so that the same rescaling is applied later when we use our dataop for prediction on new (test) data.
A convolutional network expects images with shape (N, C, H, W) where:
N: number of samples
C: number of color channels (1 for grayscale)
H, W: image height and width
So we reshape the images to (N, 1, 8, 8) for the CNN. The -1 means the first dimension (N) is inferred automatically from the array size.
The advantage of using DataOps is that the preprocessing steps are tracked in the plan and will be automatically applied during prediction.
max_value = X.max().skb.freeze_after_fit()
X_scaled = X / max_value
X_reshaped = X_scaled.reshape(-1, 1, 8, 8).astype("float32")
X_reshaped.skb.draw_graph()
Building a NN Classifier#
We’ll build a tiny CNN using PyTorch and wrap it with skorch to make it scikit-learn compatible. The architecture uses a single convolution + pooling stage and a small MLP head. The architectural choices below are meant to be:
standard: 3x3 convolutions and 2x2 max-pooling are very common
small: the dataset and images are tiny, so we keep the model tiny too
If you want more background on CNN building blocks and how convolution/pooling changes tensor shapes, see the CS231n notes: https://cs231n.github.io/convolutional-networks/
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
class TinyCNN(nn.Module):
def __init__(self, conv_channels: int = 8, hidden_units: int = 32):
super().__init__()
self.conv_channels = conv_channels
self.hidden_units = hidden_units
# 2-level CNN with 2x2 max-pooling
self.conv1 = nn.Conv2d(
in_channels=1, out_channels=conv_channels, kernel_size=3, padding=1
)
self.conv2 = nn.Conv2d(conv_channels, conv_channels, kernel_size=3, padding=1)
self.pool = nn.MaxPool2d(kernel_size=2)
# input shape = (8,8) -> conv1: (8,8) -> conv2: (8,8) -> pool: (4,4)
image_shape_after_conv = 4 * 4
# MLP head
self.fc1 = nn.Linear(conv_channels * image_shape_after_conv, hidden_units)
self.dropout = nn.Dropout(p=0.25) # Regularization to avoid overfitting
self.fc2 = nn.Linear(hidden_units, 10) # 10 digit classes (0..9)
def forward(self, x):
x = F.relu(self.conv1(x))
x = self.pool(F.relu(self.conv2(x)))
x = x.flatten(start_dim=1)
x = self.dropout(F.relu(self.fc1(x)))
return self.fc2(x)
Skorch provides scikit-learn compatible wrappers around torch training loops. That makes the torch model usable by skrub DataOps (and scikit-learn tools in general).
We use skrub.choose_from() to define hyperparameters that the DataOps
grid search will tune: conv_channels, hidden_units, and max_epochs.
The other parameters are set to common choices for this task and training data size.
from skorch import NeuralNetClassifier
device = "cpu" # use "cuda" or "mps" if available
net = NeuralNetClassifier(
module=TinyCNN,
# These choices are intentionally small so the example runs quickly.
module__conv_channels=skrub.choose_from([8, 16], name="conv_channels"),
module__hidden_units=skrub.choose_from([8, 16, 32], name="hidden_units"),
max_epochs=skrub.choose_from([10, 15], name="max_epochs"),
optimizer__lr=0.01,
optimizer=optim.Adam,
criterion=nn.CrossEntropyLoss,
device=device,
train_split=None, # We'll use skrub's grid search for validation
verbose=0,
)
Tuning the model’s hyperparameters with DataOps#
We integrate the model into the DataOps plan. First, we convert the target labels to integers for the loss computation and apply the model to the preprocessed X and y.
Finally, we use 4-fold cross-validation for the hyperparameter tuning on our DataOps plan.
Search results:
max_epochs conv_channels hidden_units mean_test_score
15 16 32 0.978295
10 16 32 0.969949
15 8 32 0.969943
15 8 16 0.968275
15 16 16 0.967165
10 8 32 0.959371
10 8 16 0.934885
10 16 16 0.928750
15 16 8 0.878707
15 8 8 0.853070
10 16 8 0.762852
10 8 8 0.692296
Let’s take a better look at the well-performing models by looking at the parallel coordinates plot. We filter to models with score >= 0.94 to focus on the top-performing configurations.
fig = search.plot_results(min_score=0.94)
fig
Interpreting the results#
Looking at the search results, we can observe several patterns:
Model capacity matters: Larger configurations with
conv_channels=16andhidden_units=32tend to perform best. Smaller models withconv_channels=8and/orhidden_units=8perform significantly worse, indicating that the task benefits from increased model capacity.More epochs generally help: Configurations with
max_epochs=15tend to perform slightly better than those withmax_epochs=10, though the gains are modest compared to architectural changes.
Conclusion#
In this example, we’ve shown how to use PyTorch and skorch within skrub DataOps. The key steps were:
Define a PyTorch
nn.Module(ourTinyCNN)Wrap it with skorch’s
NeuralNetClassifierto make it scikit-learn compatibleUse
skrub.choose_from()to specify hyperparameters for tuningIntegrate it into a DataOps plan and use grid search to find the best configuration
This pattern lets you leverage PyTorch’s flexibility for model definition while benefiting from skrub’s hyperparameter tuning and data preprocessing capabilities.
See also
Hyperparameter tuning with DataOps: Learn more about using
skrub.choose_from()and other choice objects to tune hyperparameters in DataOps plans.Tuning DataOps with Optuna: Discover how to use Optuna as a backend for more sophisticated hyperparameter search strategies with skrub DataOps.
Total running time of the script: (0 minutes 20.085 seconds)
Estimated memory usage: 538 MB