# MNIST Variational Autoencoder (VAE) Challenge

## Learning Objectives
- Understand Variational Autoencoder architecture
- Implement encoder-decoder networks
- Learn the reparameterization trick
- Train VAE on MNIST dataset
- Generate new digits from latent space

## What is a VAE?
A Variational Autoencoder is a generative model that learns to:
1. **Encode** images into a lower-dimensional latent space
2. **Decode** latent vectors back to images
3. **Generate** new images by sampling from the latent space

The key innovation is the **reparameterization trick** that makes training differentiable!


## Environment Setup

Before building the Variational Autoencoder, we need to prepare the environment and import the required libraries.
Each library has a specific purpose in our implementation:

* **`torch`** – Core PyTorch package for tensor operations and GPU computing;
* **`torch.nn`** – Tools for building neural networks (layers, loss functions, etc.);
* **`torch.nn.functional` (`F`)** – Stateless functions such as activations (`relu`, `sigmoid`) and convolutions;
* **`torch.optim`** – Optimization algorithms like Adam and SGD used during training;
* **`torch.utils.data.DataLoader`** – Efficient data batching, shuffling, and loading;
* **`torchvision`** – A PyTorch package that provides common datasets (like MNIST) and image utilities;
* **`torchvision.transforms`** – Preprocessing tools (converting to tensors, normalizing, etc.);
* **`matplotlib.pyplot`** – For visualizing results, reconstructions, and generated digits;
* **`numpy`** – For general numerical operations and array manipulations outside PyTorch.

Finally, we specify the **device** that the model will use:

* If a **GPU** is available, training will run on it (`cuda`);
* Otherwise, the code will automatically use the **CPU**.

This step ensures compatibility across any environment — whether you’re running locally, on Google Colab, or in a cloud notebook.

In [None]:
!pip install git+https://github.com/codefinity-arsenii-dr/test_vae_mnist.git

In [2]:
from testVAE.test import *

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np

# Set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")


## Step 1: Building the VAE Architecture

A **Variational Autoencoder (VAE)** consists of three connected parts — an **Encoder**, a **Reparameterization block**, and a **Decoder**.
Together, they learn how to compress and reconstruct images, enabling both **reconstruction** and **generation**.

### Encoder

The encoder takes flattened 28×28 images (784 values) and compresses them into a smaller **latent representation**.
It outputs two vectors:

* **μ (mean)** — defines the center of the distribution;
* **log σ² (log variance)** — defines the spread of the distribution.

These two parameters describe how each image maps into the latent space.

### Reparameterization Trick

Normally, random sampling breaks gradient flow.
To keep the model differentiable, we use the trick:

$$
z = \mu + \sigma \times \varepsilon, \quad \text{where } \varepsilon \sim \mathbb{N}(0, 1)
$$

This operation lets gradients pass through the sampling process during training.

### Decoder

The decoder reconstructs the input image from the latent vector `z`.
It mirrors the encoder — expanding the compressed vector back to 784 pixels.
The final `Sigmoid` layer ensures output values are between **0 and 1**, matching grayscale pixel intensity.

---

## Task

1. Define an **Encoder** that takes 784-dimensional inputs and outputs a 400-dimensional hidden representation using `ReLU` activations.
2. Define two **latent layers** — one for `mu` (mean) and one for `log_var` (log variance).
3. Implement the **Reparameterization Trick** inside `reparameterize()`.
4. Build a **Decoder** that reconstructs the input back to 784 dimensions using `ReLU` activations and a final `Sigmoid`.
5. Complete the **forward pass**, connecting all the parts together.
6. Create the model and move it to the selected device.

In [None]:
# Defining the Variational Autoencoder architecture
class VAE(nn.Module):
    def __init__(self, latent_dim=20):
        super(VAE, self).__init__()

        # Encoder: from 784 → 400 → 400
        self.encoder = nn.Sequential(
            nn.Linear(___, ___),  # Input layer
            nn.ReLU(),
            nn.Linear(___, ___),  # Hidden layer
            nn.ReLU()
        )

        # Latent space: mean (mu) and log variance (log_var)
        self.fc_mu = nn.Linear(___, ___)
        self.fc_log_var = nn.Linear(___, ___)

        # Decoder: from latent_dim → 400 → 400 → 784
        self.decoder = nn.Sequential(
            nn.Linear(___, ___),
            nn.ReLU(),
            nn.Linear(___, ___),
            nn.ReLU(),
            nn.Linear(___, ___),
            nn.Sigmoid()
        )

    def reparameterize(self, mu, log_var):
        """Reparameterization trick: z = mu + sigma * epsilon"""
        std = torch.exp(0.5 * log_var)
        eps = torch.randn_like(___)
        z = ___
        return z

    def forward(self, x):
        # Encode input
        h = self.encoder(___)
        # Compute latent variables
        mu = self.fc_mu(___)
        log_var = self.fc_log_var(___)
        # Sample latent vector
        z = self.reparameterize(___, ___)
        # Decode reconstruction
        recon_x = self.decoder(___)
        return recon_x, mu, log_var


# Create the model and move it to the selected device
model = ___

# Print the total number of trainable parameters
print(f"Model parameters: {sum(p.numel() for p in model.parameters()):,}")

In [None]:
hint1()

In [None]:
solution1()

In [None]:
check1()

## Step 2: VAE Loss Function

Training a Variational Autoencoder requires combining two objectives — **reconstruction quality** and **latent regularization**.
Both terms ensure that the model learns to reproduce inputs *and* maintain a smooth, meaningful latent space.

### 1. Reconstruction Loss (Binary Cross-Entropy)

* Measures how close the reconstructed output is to the original input;
* Implemented with `binary_cross_entropy`;
* Summed over all pixels in the batch (`reduction='sum'`).

### 2. KL Divergence Loss

* Regularizes the latent distribution so it stays close to a standard normal N(0, 1);
* Prevents the encoder from collapsing the latent space to trivial values;
* Formula:

$$
\text{KL}(q(z|x) \parallel p(z)) = -0.5 \sum (1 + \log(\sigma^2) - \mu^2 - \sigma^2)
$$

### 3. Total Loss

The final loss combines both terms:

$$
\text{Total Loss} = \text{BCE} + \text{KLD}
$$

---

## Task

1. Implement the **reconstruction loss** using `F.binary_cross_entropy`.
2. Compute the **KL Divergence** term using the provided formula.
3. Return the **total loss** as their sum.
4. Test your loss function with random normalized data (`torch.rand()` with values between 0 and 1).

In [None]:
def vae_loss(recon_x, x, mu, log_var):
    """VAE loss = BCE + KL Divergence"""

    # Reconstruction loss (Binary Cross-Entropy)
    BCE = ___  # Use F.binary_cross_entropy with reduction='sum'

    # KL Divergence loss
    KLD = -0.5 * torch.sum(___)  # Implement the KL formula using mu and log_var

    # Total loss
    return ___


# Test the loss function with normalized data (values between 0 and 1)
x = torch.rand(10, 784).to(device)  # Use torch.rand() instead of torch.randn()
recon_x, mu, log_var = model(x)
loss = vae_loss(___)
print(f"Sample loss: {loss.item():.2f}")


In [None]:
hint2()

In [None]:
solution2()

In [None]:
check2()

## Step 3: Load and Prepare the MNIST Dataset

The **MNIST dataset** contains 70,000 grayscale images of handwritten digits (28×28 pixels).
It’s a standard benchmark for testing image-based models like autoencoders.

* **Training set:** 60,000 images;
* **Test set:** 10,000 images;
* **Preprocessing:**

  * Convert each image to a tensor;
  * Flatten it to a vector of 784 pixels;
  * Normalize all pixel values to the [0, 1] range.

---

## Task

1. Define a transformation pipeline using `transforms.Compose()` to convert images into tensors and flatten them.
2. Load both the training and test MNIST datasets.
3. Create `DataLoader` objects for batching and shuffling.
4. Print the number of samples in each dataset.
5. Visualize several images to verify that loading and preprocessing work correctly.

In [None]:
# Data preprocessing
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Lambda(lambda x: ___)  # Flatten each image
])

# Load datasets
train_dataset = torchvision.datasets.MNIST(
    root='./data', train=True, download=True, transform=___
)
test_dataset = torchvision.datasets.MNIST(
    root='./data', train=False, download=True, transform=___
)

# Create data loaders
train_loader = DataLoader(___, batch_size=___, shuffle=True, num_workers=0)
test_loader = DataLoader(___, batch_size=___, shuffle=False, num_workers=0)

# Print dataset sizes
print(f"Training samples: {___}")
print(f"Test samples: {___}")

# Visualize a few samples
fig, axes = plt.subplots(1, 5, figsize=(10, 2))
for i in range(5):
    img, _ = ___[i]
    axes[i].imshow(___.view(28, 28), cmap='gray')
    axes[i].set_title(f'Sample {i+1}')
    axes[i].axis('off')
plt.tight_layout()
plt.show()

In [None]:
hint3()

In [None]:
solution3()

In [None]:
check3()

## Step 4: Training the VAE

Now it’s time to train your Variational Autoencoder!
The training process consists of repeating the same sequence over multiple epochs:

1. **Forward pass:** Encode → Reparameterize → Decode;
2. **Compute loss:** Combine reconstruction loss and KL divergence;
3. **Backward pass:** Propagate the error and update model weights;
4. **Repeat** for multiple epochs to minimize the total loss.

We’ll use the **Adam optimizer** with a learning rate of **1e-3** — a good balance between stability and convergence speed.

---

## Task

1. Initialize the optimizer using `optim.Adam()`.
2. Loop through the dataset for a given number of epochs.
3. Perform a **forward pass**, compute the **VAE loss**, and then **backpropagate**.
4. Accumulate and print the loss for progress tracking.
5. Append the average loss for each epoch to the `losses` list.

In [None]:
# Training setup
optimizer = optim.Adam(___, lr=___)
model.train()

# Training loop
epochs = ___
losses = []

print("Starting training...")
for epoch in range(epochs):
    epoch_loss = 0
    for batch_idx, (data, _) in enumerate(___):
        data = data.to(device)

        # Forward pass
        optimizer.___()  # Reset gradients
        recon_batch, mu, log_var = ___(data)
        loss = ___(recon_batch, data, mu, log_var)

        # Backward pass
        ___  # Compute gradients
        ___  # Update model parameters

        epoch_loss += loss.item()

        # Print progress
        if batch_idx % 100 == 0:
            print(f'Epoch {epoch+1}/{epochs}, Batch {batch_idx}/{len(train_loader)}, '
                  f'Loss: {loss.item()/len(data):.4f}')

    avg_loss = epoch_loss / len(___)
    losses.append(avg_loss)
    print(f'Epoch {epoch+1} completed. Average Loss: {avg_loss:.4f}')
    print("-" * 50)

print("Training completed!")


In [None]:
hint4()

In [None]:
solution4()

In [None]:
check4()

## Final Step: Visualize Training and Generated Results

After training, it’s time to evaluate your Variational Autoencoder.
We’ll first visualize the **training loss curve**, then display how well the model can **reconstruct** and **generate** digits.

### Training Progress

Plot the training loss to observe how the model improved over epochs.
A consistent downward trend indicates successful optimization.

In [None]:
# Plot training loss
plt.figure(figsize=(8, 5))
plt.plot(range(1, len(losses) + 1), losses, 'b-', linewidth=2, marker='o')
plt.title('VAE Training Loss', fontsize=14, fontweight='bold')
plt.xlabel('Epoch')
plt.ylabel('Average Loss')
plt.grid(True, alpha=0.3)
plt.xticks(range(1, len(losses) + 1))
plt.show()

print(f"Final loss: {losses[-1]:.2f}")
print(f"Loss improvement: {((losses[0] - losses[-1]) / losses[0] * 100):.1f}%")


### Reconstruct and Generate Digits

Now, let’s visually inspect what the VAE learned:

* **Top row:** Original test images
* **Middle row:** Reconstructed versions
* **Bottom row:** Completely new digits generated from random latent vectors

In [None]:
# Set model to evaluation mode
model.eval()

with torch.no_grad():
    # Get a batch of test images
    data, _ = next(iter(test_loader))
    data = data.to(device)

    # Reconstruct test images
    recon_data, _, _ = model(data)

    # Generate new images from random latent vectors
    z = torch.randn(16, 20).to(device)
    generated = model.decoder(z)

    # Convert tensors to numpy for visualization
    original_imgs = data[:16].cpu().numpy().reshape(-1, 28, 28)
    recon_imgs = recon_data[:16].cpu().numpy().reshape(-1, 28, 28)
    gen_imgs = generated.cpu().numpy().reshape(-1, 28, 28)

    # Plot results
    fig, axes = plt.subplots(3, 8, figsize=(16, 6))
    fig.suptitle('VAE Results: Original | Reconstructed | Generated', fontsize=16, fontweight='bold')

    for i in range(8):
        axes[0, i].imshow(original_imgs[i], cmap='gray')
        axes[0, i].set_title('Original', fontweight='bold')
        axes[0, i].axis('off')

        axes[1, i].imshow(recon_imgs[i], cmap='gray')
        axes[1, i].set_title('Reconstructed', fontweight='bold')
        axes[1, i].axis('off')

        axes[2, i].imshow(gen_imgs[i], cmap='gray')
        axes[2, i].set_title('Generated', fontweight='bold')
        axes[2, i].axis('off')

    plt.tight_layout()
    plt.show()

print("VAE training and generation complete.")
print("The model successfully reconstructed and generated MNIST digits.")


## Challenge Summary

Congratulations on completing the Variational Autoencoder project. You have successfully built, trained, and evaluated a fully functional VAE capable of generating new data samples.

### What You Accomplished

* Built a VAE with an encoder–decoder architecture
* Implemented the reparameterization trick
* Trained the model on the MNIST dataset
* Generated new digits from random latent vectors
* Visualized and compared original, reconstructed, and generated images

### Key Concepts Learned

* **Variational Autoencoders (VAEs)** compress data into latent representations
* The **reparameterization trick** enables differentiable sampling during training
* The **VAE loss** combines reconstruction and KL divergence terms
* **Generative models** learn to create realistic data from underlying distributions