convolutional_neural_networks.md · taellinglin/EverythingIsAFont at main

l🎵 Music Playing

👋 Welcome! Today, we’re learning about Convolution in Neural Networks! 🧠🖼️

🤔 What is Convolution?

Convolution helps computers understand pictures by looking at patterns instead of exact positions! 🖼️🔍

Imagine you have two images that look almost the same, but one is a little moved.
A computer might think they are totally different! 😲
Convolution fixes this problem! ✅

🛠️ How Convolution Works

We use something called a kernel (a small filter 🔲) that slides over an image.
It checks different parts of the picture and creates a new image called an activation map!

1️⃣ The image is a grid of numbers 🖼️
2️⃣ The kernel is a small grid 🔳 that moves across the image
3️⃣ It multiplies numbers in the image with the numbers in the kernel ✖️
4️⃣ The results are added together ➕
5️⃣ We move to the next spot and repeat! 🔄
6️⃣ The final result is the activation map 🎯

📏 How Big is the Activation Map?

The size of the activation map depends on:

M (image size) 📏
K (kernel size) 🔳
Stride (how far the kernel moves) 👣

Formula:

New size = (Image size - Kernel size) + 1

Example:

4×4 image 📷
2×2 kernel 🔳
Activation map = 3×3 ✅

👣 What is Stride?

Stride is how far the kernel moves each time!

Stride = 1 ➝ Moves one step at a time 🐢
Stride = 2 ➝ Moves two steps at a time 🚶‍♂️
Bigger stride = Smaller activation map! 📏

🛑 What is Zero Padding?

Sometimes, the kernel doesn’t fit perfectly in the image. 😕
So, we add extra rows and columns of zeros around the image! 0️⃣0️⃣0️⃣

This makes sure the kernel covers everything! ✅

Formula:

New Image Size = Old Size + 2 × Padding

🎨 What About Color Images?

For black & white images, we use Conv2D with one channel (grayscale). 🌑
For color images, we use three channels (Red, Green, Blue - RGB)! 🎨🌈

🏆 Summary

✅ Convolution helps computers find patterns in images!
✅ We use a kernel to create an activation map!
✅ Stride & padding change how the convolution works!
✅ This is how computers "see" images! 👀🤖

🎉 Great job! Now, let’s try convolution in the lab! 🏗️🤖✨

🎵 Music Playing

👋 Welcome! Today, we’re learning about Activation Functions and Max Pooling! 🚀🔢

🤖 What is an Activation Function?

Activation functions help a neural network decide what’s important! 🧠
They change the values in the activation map to help the model learn better.

🔥 Example: ReLU Activation Function

1️⃣ We take an input image 🖼️
2️⃣ We apply convolution to create an activation map 📊
3️⃣ We apply ReLU (Rectified Linear Unit):

If a value is negative ➝ Change it to 0 ❌
If a value is positive ➝ Keep it ✅

🛠 Example Calculation

Before ReLU	After ReLU
-4	0
0	0
4	4

All negative numbers become zero! ✨

In PyTorch, we apply the ReLU function after convolution:

import torch.nn.functional as F

output = F.relu(conv_output)

🌊 What is Max Pooling?

Max Pooling helps the network focus on important details while making images smaller! 📏🔍

🏗 How It Works

1️⃣ We divide the image into small regions (e.g., 2×2 squares)
2️⃣ We keep only the largest value in each region
3️⃣ We move the window and repeat until we’ve covered the whole image

📊 Example: 2×2 Max Pooling

Before Pooling	After Pooling
1, 6, 2, 3	6, 8
5, 8, 7, 4	9, 7
9, 2, 3, 7

Only the biggest number in each section is kept! ✅

🏆 Why Use Max Pooling?

✅ Reduces image size ➝ Makes training faster! 🚀
✅ Ignores small changes in images ➝ More stable results! 🔄
✅ Helps find important features in the picture! 🖼️

In PyTorch, we apply Max Pooling like this:

import torch.nn.functional as F

output = F.max_pool2d(activation_map, kernel_size=2, stride=2)

🎉 Great job! Now, let’s try using activation functions and max pooling in our own models! 🏗️🤖✨

🎵 Music Playing

👋 Welcome! Today, we’re learning about Convolution with Multiple Channels! 🖼️🤖

🤔 What’s a Channel?

A channel is like a layer of an image! 🌈

Black & White Images ➝ 1 channel (grayscale) 🏳️
Color Images ➝ 3 channels (Red, Green, Blue - RGB) 🎨

Neural networks see images by looking at these channels separately! 👀

🎯 1. Multiple Output Channels

Usually, we use one kernel to create one activation map 📊
But what if we want to detect different things in an image? 🤔

Solution: Use multiple kernels! Each kernel finds different features! 🔍

🔥 Example: Detecting Lines

1️⃣ A vertical line kernel finds vertical edges 📏
2️⃣ A horizontal line kernel finds horizontal edges 📐

More kernels = More ways to see the image! 👀✅

🎨 2. Multiple Input Channels

Color images have 3 channels (Red, Green, Blue).
To process them, we use a separate kernel for each channel! 🎨

1️⃣ Apply a Red kernel to the Red part of the image 🔴
2️⃣ Apply a Green kernel to the Green part of the image 🟢
3️⃣ Apply a Blue kernel to the Blue part of the image 🔵
4️⃣ Add the results together to get one activation map!

This helps the neural network understand colors and patterns! 🌈

🔄 3. Multiple Input & Output Channels

Now, let’s combine everything! 🚀

Multiple input channels (like RGB images)
Multiple output channels (different filters detecting different things)

Each output channel gets its own set of kernels for each input channel.
We apply the kernels, add the results, and get multiple activation maps! 🎯

🏗 Example in PyTorch

import torch.nn as nn

conv = nn.Conv2d(in_channels=3, out_channels=5, kernel_size=3)

This means:
✅ 3 input channels (Red, Green, Blue)
✅ 5 output channels (5 different filters detecting different things)

🏆 Why is This Important?

✅ Helps the neural network find different patterns 🎨
✅ Works for color images and complex features 🤖
✅ Makes the network more powerful! 💪

🎉 Great job! Now, let’s try convolution with multiple channels in our own models! 🏗️🤖✨

🎵 Music Playing

👋 Welcome! Today, we’re building a CNN for MNIST! 🏗️🔢
MNIST is a dataset of handwritten numbers (0-9). ✍️🖼️

🏗 CNN Structure

📏 Image Size: 16×16 (to make training faster)
🔄 Layers:

First Convolution Layer ➝ 16 output channels
Second Convolution Layer ➝ 32 output channels
Final Layer ➝ 10 output neurons (one for each digit)

🛠 Building the CNN in PyTorch

📌 Step 1: Define the CNN

import torch.nn as nn

class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=16, kernel_size=5, padding=2)  
        self.pool = nn.MaxPool2d(kernel_size=2)  
        self.conv2 = nn.Conv2d(in_channels=16, out_channels=32, kernel_size=5, padding=2)  
        self.fc = nn.Linear(32 * 4 * 4, 10)  # Fully connected layer (512 inputs, 10 outputs)

    def forward(self, x):
        x = self.pool(nn.ReLU()(self.conv1(x)))  # First layer: Conv + ReLU + Pool
        x = self.pool(nn.ReLU()(self.conv2(x)))  # Second layer: Conv + ReLU + Pool
        x = x.view(-1, 512)  # Flatten the 4x4x32 output to 1D (512 elements)
        x = self.fc(x)  # Fully connected layer for classification
        return x

🔍 Understanding the Output Shape

After Max Pooling, the image shrinks to 4×4 pixels.
Since we have 32 channels, the total output is:

4 × 4 × 32 = 512 elements

Each neuron in the final layer gets 512 inputs, and since we have 10 digits (0-9), we use 10 neurons.

🔄 Forward Step

1️⃣ Apply First Convolution Layer ➝ Activation ➝ Max Pooling
2️⃣ Apply Second Convolution Layer ➝ Activation ➝ Max Pooling
3️⃣ Flatten the Output (4×4×32 → 512)
4️⃣ Apply the Final Output Layer (10 Neurons for 10 Digits)

🏋️‍♂️ Training the Model

Check the lab to see how we train the CNN using:
✅ Backpropagation
✅ Stochastic Gradient Descent (SGD)
✅ Loss Function & Accuracy Check

🎉 Great job! Now, let’s train our CNN to recognize handwritten digits! 🏗️🔢🤖

🎵 Music Playing

👋 Welcome! Today, we’re learning about Convolutional Neural Networks (CNNs)! 🤖🖼️

🤔 What is a CNN?

A Convolutional Neural Network (CNN) is a special type of neural network that understands images! 🎨
It learns to find patterns, like:
✅ Edges (lines & shapes)
✅ Textures (smooth or rough areas)
✅ Objects (faces, animals, letters)

🏗 How Does a CNN Work?

A CNN is made of three main steps:

1️⃣ Convolution Layer 🖼️➝🔍

Uses kernels (small filters) to detect patterns in an image
Creates an activation map that highlights important features

2️⃣ Pooling Layer 🔄➝📏

Shrinks the activation map to keep only the most important parts
Max Pooling picks the biggest values in each small region

3️⃣ Fully Connected Layer 🏗️➝🎯

The final layer makes a decision (like cat 🐱 or dog 🐶)

🎨 Example: Detecting Lines

We train a CNN to recognize horizontal and vertical lines:

1️⃣ Input Image (X)
2️⃣ First Convolution Layer

Uses two kernels to create two activation maps
Applies ReLU (activation function) to remove negative values
Uses Max Pooling to make learning easier

3️⃣ Second Convolution Layer

Takes two input channels from the first layer
Uses two new kernels to create one activation map
Again, applies ReLU + Max Pooling

4️⃣ Flattening ➝ Turns the 2D image into 1D data
5️⃣ Final Prediction ➝ Uses a fully connected layer to decide:

0 = Vertical Line
1 = Horizontal Line

🔄 How to Build a CNN in PyTorch

🏗 CNN Constructor

import torch.nn as nn

class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=2, kernel_size=3, padding=1)
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
        self.conv2 = nn.Conv2d(in_channels=2, out_channels=1, kernel_size=3, padding=1)
        self.fc = nn.Linear(49, 2)  # Fully connected layer (49 inputs, 2 outputs)

    def forward(self, x):
        x = self.pool(nn.ReLU()(self.conv1(x)))  # First layer: Conv + ReLU + Pool
        x = self.pool(nn.ReLU()(self.conv2(x)))  # Second layer: Conv + ReLU + Pool
        x = x.view(-1, 49)  # Flatten to 1D
        x = self.fc(x)  # Fully connected layer
        return x

🏋️‍♂️ Training the CNN

We train the CNN using backpropagation and gradient descent:

1️⃣ Load the dataset (images of lines) 📊
2️⃣ Create a CNN model 🏗️
3️⃣ Define a loss function (to measure mistakes) ❌
4️⃣ Choose an optimizer (to improve learning) 🔄
5️⃣ Train the model until it gets better! 🚀

As training progresses:
📉 Loss goes down ➝ Model makes fewer mistakes!
📈 Accuracy goes up ➝ Model gets better at predictions!

🏆 Why Use CNNs?

✅ Finds patterns in images 🔍
✅ Works with real-world data (faces, animals, objects) 🖼️
✅ More efficient than regular neural networks 💡

🎉 Great job! Now, let’s build and train our own CNN! 🏗️🤖✨

🎵 Music Playing

👋 Welcome! Today, we’re building a CNN for MNIST! 🏗️🖼️
MNIST is a dataset of handwritten numbers (0-9). ✍️🔢

🏗 CNN Structure

📏 Image Size: 16×16 (to make training faster)
🔄 Layers:

First Convolution Layer ➝ 16 output channels
Second Convolution Layer ➝ 32 output channels
Final Layer ➝ 10 output neurons (one for each digit)

🛠 Building the CNN in PyTorch

🔹 Step 1: Define the CNN

import torch.nn as nn

class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=16, kernel_size=5, padding=2)  
        self.pool = nn.MaxPool2d(kernel_size=2)  
        self.conv2 = nn.Conv2d(in_channels=16, out_channels=32, kernel_size=5, padding=2)  
        self.fc = nn.Linear(32 * 4 * 4, 10)  # Fully connected layer (512 inputs, 10 outputs)

    def forward(self, x):
        x = self.pool(nn.ReLU()(self.conv1(x)))  # First layer: Conv + ReLU + Pool
        x = self.pool(nn.ReLU()(self.conv2(x)))  # Second layer: Conv + ReLU + Pool
        x = x.view(-1, 512)  # Flatten the 4x4x32 output to 1D (512 elements)
        x = self.fc(x)  # Fully connected layer for classification
        return x

🔍 Understanding the Output Shape

After Max Pooling, the image shrinks to 4×4 pixels.
Since we have 32 channels, the total output is:

4 × 4 × 32 = 512 elements

Each neuron in the final layer gets 512 inputs, and since we have 10 digits (0-9), we use 10 neurons.

🔄 Forward Step

🏋️‍♂️ Training the Model

Check the lab to see how we train the CNN using:
✅ Backpropagation
✅ Stochastic Gradient Descent (SGD)
✅ Loss Function & Accuracy Check

🎉 Great job! Now, let’s train our CNN to recognize handwritten digits! 🏗️🔢🤖

🎵 Music Playing

👋 Welcome! Today, we’re learning how to use Pretrained TorchVision Models! 🤖🖼️

🤔 What is a Pretrained Model?

A pretrained model is a neural network that has already been trained by experts on a large dataset.
✅ Saves time (no need to train from scratch) ⏳
✅ Works better (already optimized) 🎯
✅ We only train the final layer for our own images! 🔄

🔄 Using ResNet18 (A Pretrained Model)

We will use ResNet18, a powerful model trained on color images. 🎨
It has skip connections (we won’t go into details, but it helps learning).

We only replace the last layer to match our dataset! 🔁

🛠 Steps to Use a Pretrained Model

📌 Step 1: Load the Pretrained Model

import torchvision.models as models

model = models.resnet18(pretrained=True)  # Load pretrained ResNet18

📌 Step 2: Normalize Images (Required for ResNet18)

import torchvision.transforms as transforms

transform = transforms.Compose([
    transforms.Resize((224, 224)),  # Resize image
    transforms.ToTensor(),  # Convert to tensor
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])  # Normalize
])

📌 Step 3: Prepare the Dataset

Create a dataset object for your own images with training and testing data. 📊

📌 Step 4: Replace the Output Layer

The last hidden layer has 512 neurons
We create a new output layer for our dataset

Example: If we have 7 classes, we create a layer with 7 outputs:

import torch.nn as nn

for param in model.parameters():
    param.requires_grad = False  # Freeze pretrained layers

model.fc = nn.Linear(512, 7)  # Replace output layer (512 inputs → 7 outputs)

🏋️‍♂️ Training the Model

📌 Step 5: Create Data Loaders

train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=15, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=10, shuffle=False)

📌 Step 6: Set Up Training

import torch.optim as optim

criterion = nn.CrossEntropyLoss()  # Loss function
optimizer = optim.Adam(model.fc.parameters(), lr=0.001)  # Optimizer (only for last layer)

📌 Step 7: Train the Model

1️⃣ Set model to training mode 🏋️

model.train()

2️⃣ Train for 20 epochs
3️⃣ Set model to evaluation mode when predicting 📊

model.eval()

🏆 Why Use Pretrained Models?

✅ Saves time (no need to train from scratch)
✅ Works better (pretrained on millions of images)
✅ We only change one layer for our dataset!

🎉 Great job! Now, try using a pretrained model for your own images! 🏗️🤖✨

🎵 Music Playing

👋 Welcome! Today, we’re learning how to use GPUs in PyTorch! 🚀💻

🤔 Why Use a GPU?

A Graphics Processing Unit (GPU) can train models MUCH faster than a CPU!
✅ Faster computation ⏩
✅ Better for large datasets 📊
✅ Helps train deep learning models efficiently 🤖

🔥 What is CUDA?

CUDA is a special tool made by NVIDIA that allows us to use GPUs for AI tasks. 🎮🚀
In PyTorch, we use torch.cuda to work with GPUs.

🛠 Step 1: Check if a GPU is Available

import torch

# Check if a GPU is available
torch.cuda.is_available()  # Returns True if a GPU is detected

🎯 Step 2: Set Up the GPU

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

"cuda:0" = First available GPU 🎮
"cpu" = Use the CPU if no GPU is found

🏗 Step 3: Sending Tensors to the GPU

In PyTorch, data is stored in Tensors.
To move data to the GPU, use .to(device).

tensor = torch.randn(3, 3)  # Create a random tensor
tensor = tensor.to(device)  # Move it to the GPU

✅ Faster processing on the GPU! ⚡

🔄 Step 4: Using a GPU with a CNN

You don’t need to change your CNN code! Just move the model to the GPU after creating it:

model = CNN()  # Create CNN model
model.to(device)  # Move the model to the GPU

This converts all layers to CUDA tensors for GPU computation! 🎮

🏋️‍♂️ Step 5: Training a Model on a GPU

Training is the same, but you must send your data to the GPU!

for images, labels in train_loader:
    images, labels = images.to(device), labels.to(device)  # Move data to GPU
    optimizer.zero_grad()  # Clear gradients
    outputs = model(images)  # Forward pass (on GPU)
    loss = criterion(outputs, labels)  # Compute loss
    loss.backward()  # Backpropagation
    optimizer.step()  # Update weights

✅ The model trains much faster! 🚀

🎯 Step 6: Testing the Model

For testing, only move the images (not the labels) to the GPU:

for images, labels in test_loader:
    images = images.to(device)  # Move images to GPU
    outputs = model(images)  # Get predictions

✅ Saves memory and speeds up testing! ⚡

🏆 Summary

✅ GPUs make training faster 🎮
✅ Use torch.cuda to work with GPUs
✅ Move data & models to the GPU with .to(device)
✅ Training & testing are the same, but data must be on the GPU

🎉 Great job! Now, try training a model using a GPU in PyTorch! 🏗️🚀