Do you know how to concat two tensors in pytorch?
Do you know how to get the maximum element of 2D-tensor with respect to column dimension?
One day, I realized that I am relying on web search or AI service like chatGTP. What if somebody asks me to write AI code? Can I write it by myself?
So I asked Claude to generate a AI problem to solve and I found that solving the problem was very helpful for me because it reminds me of useful functions and their usages.
I’d like to share the boiler plate code below that I used to solve the problem Claude asked.
Problem
Convolutional Neural Network (CNN): Implement and train a simple CNN using the CIFAR-10 dataset.
1. Data Preparation:
- Use torchvision to download and load the CIFAR-10 dataset.
- Split the data into training (80%) and validation (20%) sets.
- Apply data normalization and augmentation (use RandomHorizontalFlip).
- Create DataLoaders for both training and validation sets with a batch size of 64.
2. CNN Model Implementation: Implement a CNN with the following architecture:
- Convolutional layer 1: 3 input channels, 32 filters of size 3x3, ReLU activation
- Max pooling layer: 2x2 size
- Convolutional layer 2: 32 input channels, 64 filters of size 3x3, ReLU activation
- Max pooling layer: 2x2 size
- Fully connected layer 1: 1600 input neurons, 512 output neurons, ReLU activation
- Dropout layer: 50% dropout rate
- Fully connected layer 2: 512 input neurons, 10 output neurons (CIFAR-10 classes)
3. Training Setup:
- Loss function: CrossEntropyLoss
- Optimizer: Adam (learning rate: 0.001)
- Number of epochs: 20
4. Training Loop Implementation:
- For each epoch, train on the training data and evaluate on the validation data.
- Print training loss, training accuracy, validation loss, and validation accuracy for each epoch.
- Save the model with the best validation accuracy.
5. Model Evaluation:
- Evaluate the final model’s performance on the test dataset.
- Calculate and report overall accuracy.
6. Results Visualization:
- Plot training and validation loss curves.
- Plot training and validation accuracy curves.
- Randomly select and display 10 images from the test set along with their true labels and model predictions.
Solutions
Library Import
# import libraries
# common part
import numpy as np
import torch
import torch.nn as nn
# data prep part
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader, random_split
# visualization part
import matplotlib.pyplot as plt
To be honest, I do not use vision model often so that torchvision is not really familiar to me.
Data Prep
# transform functions
train_transforms = transforms.Compose([
transforms.ToTensor(),
transforms.RandomHorizontalFlip(0.5),
transforms.Normalize((0.5,0.5,0.5), (0.5, 0.5,0.5))
])
test_transforms = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.5,0.5,0.5), (0.5, 0.5,0.5))
])
# get CIFAR10 dataset
train_dataset = torchvision.datasets.CIFAR10(root='./data/', train=True, download=True, transform=train_transforms)
test_dataset = torchvision.datasets.CIFAR10(root='./data/', train=False, download=True, transform=test_transforms)
train_size = int(0.8 * len(dataset))
val_size = len(dataset) - train_size
train_dataset, val_dataset = random_split(dataset, [train_size, val_size]) # use random_split to keep transform function.
# build data loaders
batch_size = 64
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)
Basically, I download CIFAR10
data from the web. By the train
argument, it will be either train dataset or test dataset. The transform
argument part is funny part. We build a transform function which is made of a series of functions. (Think of Compose
as nn.Sequential!) However, the functions are not called when we build the datasets. The functions are called when we use a data loader which has the dataset.
In the beginning, I thought the RandomHorizontalFlip
must be excuted in advance to increase the number of training data. However, the data loader will dynamically apply the function when we are training. Note that the RandomHorizontalFlip
only applies to the training dataset.
ToTensor
converts PIL or ndarray into tensor. While converting, it reduce the value range from [0,255] to [0, 1.0]. This is why we can normalize with 0.5 means and 0.5 standard deviation. We might use StandardScaler
to be more thorough, but rough scaling still works well.
To keep many attributes in the dataset such as trasnform functions, we need to use random_split
from torch
library instead of train_test_split
from scikit-learn
. train_test_split
will return lists, not datasets.
Actually, val_loader should not use
RandomHorizontalFlip
. For simplicity, I just keep this way. In order to use the function only for the training dataset, then we need to use CustomDataset.
CNN Model
class VisionModel(nn.Module):
def __init__(self):
super(VisionModel, self).__init__()
self.net_2d = nn.Sequential(
nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3, padding=1), # 32 * 32 * 32
nn.ReLU(),
nn.MaxPool2d(2), # 32 * 16 * 16
nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, padding=1), # 64 * 16 * 16
nn.ReLU(),
nn.MaxPool2d(2), # 64 * 8 * 8
)
self.net_mlp = nn.Sequential(
nn.Linear(64*8*8, 512),
nn.ReLU(),
nn.Dropout(0.5),
nn.Linear(512,10)
)
def forward(self, x):
out_2d = self.net_2d(x)
out_2d = out_2d.view(-1, 64*8*8)
out = self.net_mlp(out_2d)
return out
As instructed, the model is made of 2D convolution layers and MLP layers. After 2D layers, we need to flatten the intermediate data to feed into MLP.
The final output dimension is 10 since each output indicates the probability of whether the object is in the class.
Note that padding is applied to keep the dimension. padding=1
means padding is added to each side (top, bottom, left and right).
Training Prep
from torch.optim import Adam
from torch.nn import CrossEntropyLoss
model = VisionModel()
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
model.to(device)
optim = Adam(model.parameters(), lr=1e-3)
num_epoch = 20
criterion = CrossEntropyLoss()
I instantiated my VisionModel
and loaded it on GPU. As instructed, I used Adam
optimizer and CrossEntropyLoss
.
Note that CrossEntropyLoss
can takes different size of two tensors. For example, my model output has 10 dim. However, the label from torchvision dataset has 1 dim and integer which indicates the class. Without one-hot encoding, CrossEntropyLoss
can automatically process these different formats.
Training and validation Iteration
train_losses, train_accs = [], []
val_losses, val_accs = [], []
best_val_acc = 0.0
for epoch in range(num_epoch):
model.train()
train_loss = 0.0
train_correct = 0
for x, y_true in train_loader:
y_pred = model(x.to(device))
y_true = y_true.to(device)
loss = criterion(y_pred, y_true)
optim.zero_grad()
loss.backward()
optim.step()
train_loss += loss.item()
_, prediction = y_pred.max(1) # torch.max returns 2 items. Check which class has the highest probability.
train_correct += prediction.eq(y_true).sum().item() # compare with the integer label.
train_loss /= len(train_loader)
train_acc = train_correct / len(train_dataset)
train_losses.append(train_loss)
train_accs.append(train_acc)
model.eval()
with torch.no_grad():
val_loss = 0.0
val_correct = 0
for x, y_true in val_loader:
y_pred = model(x.to(device))
y_true = y_true.to(device)
loss = criterion(y_pred, y_true)
val_loss += loss.item()
_, prediction = y_pred.max(1)
val_correct += prediction.eq(y_true).sum().item()
val_loss /= len(val_loader)
val_acc = val_correct / len(val_dataset)
val_losses.append(val_loss)
val_accs.append(val_acc)
print(f"Epoch {epoch+1}/{num_epoch}:")
print(f"Train Loss: {train_loss:.4f}, Train Accuracy: {train_acc:.4f}")
print(f"Val Loss: {val_loss:.4f}, Val Accuracy: {val_acc:.4f}")
if val_acc > best_val_acc: # save the model whenever the best validation score happened.
best_val_acc = val_acc
torch.save(model.state_dict(), 'best_model.pth')
Test trained model
test_model = VisionModel().to(device)
test_model.load_state_dict(torch.load('best_model.pth'))
test_correct = 0
all_preds = []
all_labels = []
with torch.no_grad():
for x, y_true in test_loader:
x, y_true = x.to(device), y_true.to(device)
y_pred = test_model(x)
_, prediction = y_pred.max(1)
test_correct += prediction.eq(y_true).sum().item()
all_preds.extend(prediction.cpu().numpy())
all_labels.extend(y_true.cpu().numpy())
test_accuracy = test_correct / len(test_dataset)
print(f"Test Accuracy: {test_accuracy:.4f}")
I only chcekd the accuracy and it is 0.73 which seems high enough to me.
Visualization
import matplotlib.pyplot as plt
plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.plot(train_losses, label='Train Loss')
plt.plot(val_losses, label='Val Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.subplot(1, 2, 2)
plt.plot(train_accs, label='Train Accuracy')
plt.plot(val_accs, label='Val Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.tight_layout()
plt.show()
classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
def imshow(img):
img = img / 2 + 0.5 # unnormalize
npimg = img.numpy()
plt.imshow(np.transpose(npimg, (1, 2, 0)))
dataiter = iter(test_loader)
images, labels = next(dataiter)
model.eval()
outputs = model(images[:10].to(device))
_, predicted = torch.max(outputs, 1)
plt.figure(figsize=(15, 10))
for i in range(10):
plt.subplot(2, 5, i+1)
imshow(images[i])
plt.title(f'True: {classes[labels[i]]}\nPred: {classes[predicted[i]]}')
plt.tight_layout()
plt.show()
Below images are the results of visualizatoin. Overfitting happens within 10 epochs but I got quite accurate classifier model, confirmed by test images.
Conclusion
Even though this code looks really simple and tutorial-like but when it comes to hands-on coding then you will be surprised how much you don’t know the function. This was a really good opportunity for me to check out many documentations and learn accurage usage.