EverythingIsAFont / convolutional_neural_networks.md
taellinglin's picture
Upload 61 files
9dce563 verified

A newer version of the Gradio SDK is available: 5.34.2

Upgrade

l๐ŸŽต Music Playing

๐Ÿ‘‹ Welcome! Today, weโ€™re learning about Convolution in Neural Networks! ๐Ÿง ๐Ÿ–ผ๏ธ

๐Ÿค” What is Convolution?

Convolution helps computers understand pictures by looking at patterns instead of exact positions! ๐Ÿ–ผ๏ธ๐Ÿ”

Imagine you have two images that look almost the same, but one is a little moved.
A computer might think they are totally different! ๐Ÿ˜ฒ
Convolution fixes this problem! โœ…


๐Ÿ› ๏ธ How Convolution Works

We use something called a kernel (a small filter ๐Ÿ”ฒ) that slides over an image.
It checks different parts of the picture and creates a new image called an activation map!

1๏ธโƒฃ The image is a grid of numbers ๐Ÿ–ผ๏ธ
2๏ธโƒฃ The kernel is a small grid ๐Ÿ”ณ that moves across the image
3๏ธโƒฃ It multiplies numbers in the image with the numbers in the kernel โœ–๏ธ
4๏ธโƒฃ The results are added together โž•
5๏ธโƒฃ We move to the next spot and repeat! ๐Ÿ”„
6๏ธโƒฃ The final result is the activation map ๐ŸŽฏ


๐Ÿ“ How Big is the Activation Map?

The size of the activation map depends on:

  • M (image size) ๐Ÿ“
  • K (kernel size) ๐Ÿ”ณ
  • Stride (how far the kernel moves) ๐Ÿ‘ฃ

Formula:

New size = (Image size - Kernel size) + 1

Example:

  • 4ร—4 image ๐Ÿ“ท
  • 2ร—2 kernel ๐Ÿ”ณ
  • Activation map = 3ร—3 โœ…

๐Ÿ‘ฃ What is Stride?

Stride is how far the kernel moves each time!

  • Stride = 1 โž Moves one step at a time ๐Ÿข
  • Stride = 2 โž Moves two steps at a time ๐Ÿšถโ€โ™‚๏ธ
  • Bigger stride = Smaller activation map! ๐Ÿ“

๐Ÿ›‘ What is Zero Padding?

Sometimes, the kernel doesnโ€™t fit perfectly in the image. ๐Ÿ˜•
So, we add extra rows and columns of zeros around the image! 0๏ธโƒฃ0๏ธโƒฃ0๏ธโƒฃ

This makes sure the kernel covers everything! โœ…

Formula:

New Image Size = Old Size + 2 ร— Padding

๐ŸŽจ What About Color Images?

For black & white images, we use Conv2D with one channel (grayscale). ๐ŸŒ‘
For color images, we use three channels (Red, Green, Blue - RGB)! ๐ŸŽจ๐ŸŒˆ


๐Ÿ† Summary

โœ… Convolution helps computers find patterns in images!
โœ… We use a kernel to create an activation map!
โœ… Stride & padding change how the convolution works!
โœ… This is how computers "see" images! ๐Ÿ‘€๐Ÿค–


๐ŸŽ‰ Great job! Now, letโ€™s try convolution in the lab! ๐Ÿ—๏ธ๐Ÿค–โœจ


๐ŸŽต Music Playing

๐Ÿ‘‹ Welcome! Today, weโ€™re learning about Activation Functions and Max Pooling! ๐Ÿš€๐Ÿ”ข

๐Ÿค– What is an Activation Function?

Activation functions help a neural network decide whatโ€™s important! ๐Ÿง 
They change the values in the activation map to help the model learn better.


๐Ÿ”ฅ Example: ReLU Activation Function

1๏ธโƒฃ We take an input image ๐Ÿ–ผ๏ธ
2๏ธโƒฃ We apply convolution to create an activation map ๐Ÿ“Š
3๏ธโƒฃ We apply ReLU (Rectified Linear Unit):

  • If a value is negative โž Change it to 0 โŒ
  • If a value is positive โž Keep it โœ…

๐Ÿ›  Example Calculation

Before ReLU After ReLU
-4 0
0 0
4 4

All negative numbers become zero! โœจ

In PyTorch, we apply the ReLU function after convolution:

import torch.nn.functional as F

output = F.relu(conv_output)

๐ŸŒŠ What is Max Pooling?

Max Pooling helps the network focus on important details while making images smaller! ๐Ÿ“๐Ÿ”

๐Ÿ— How It Works

1๏ธโƒฃ We divide the image into small regions (e.g., 2ร—2 squares)
2๏ธโƒฃ We keep only the largest value in each region
3๏ธโƒฃ We move the window and repeat until weโ€™ve covered the whole image

๐Ÿ“Š Example: 2ร—2 Max Pooling

Before Pooling After Pooling
1, 6, 2, 3 6, 8
5, 8, 7, 4 9, 7
9, 2, 3, 7

Only the biggest number in each section is kept! โœ…


๐Ÿ† Why Use Max Pooling?

โœ… Reduces image size โž Makes training faster! ๐Ÿš€
โœ… Ignores small changes in images โž More stable results! ๐Ÿ”„
โœ… Helps find important features in the picture! ๐Ÿ–ผ๏ธ

In PyTorch, we apply Max Pooling like this:

import torch.nn.functional as F

output = F.max_pool2d(activation_map, kernel_size=2, stride=2)

๐ŸŽ‰ Great job! Now, letโ€™s try using activation functions and max pooling in our own models! ๐Ÿ—๏ธ๐Ÿค–โœจ


๐ŸŽต Music Playing

๐Ÿ‘‹ Welcome! Today, weโ€™re learning about Convolution with Multiple Channels! ๐Ÿ–ผ๏ธ๐Ÿค–

๐Ÿค” Whatโ€™s a Channel?

A channel is like a layer of an image! ๐ŸŒˆ

  • Black & White Images โž 1 channel (grayscale) ๐Ÿณ๏ธ
  • Color Images โž 3 channels (Red, Green, Blue - RGB) ๐ŸŽจ

Neural networks see images by looking at these channels separately! ๐Ÿ‘€


๐ŸŽฏ 1. Multiple Output Channels

Usually, we use one kernel to create one activation map ๐Ÿ“Š
But what if we want to detect different things in an image? ๐Ÿค”

  • Solution: Use multiple kernels! Each kernel finds different features! ๐Ÿ”

๐Ÿ”ฅ Example: Detecting Lines

1๏ธโƒฃ A vertical line kernel finds vertical edges ๐Ÿ“
2๏ธโƒฃ A horizontal line kernel finds horizontal edges ๐Ÿ“

More kernels = More ways to see the image! ๐Ÿ‘€โœ…


๐ŸŽจ 2. Multiple Input Channels

Color images have 3 channels (Red, Green, Blue).
To process them, we use a separate kernel for each channel! ๐ŸŽจ

1๏ธโƒฃ Apply a Red kernel to the Red part of the image ๐Ÿ”ด
2๏ธโƒฃ Apply a Green kernel to the Green part of the image ๐ŸŸข
3๏ธโƒฃ Apply a Blue kernel to the Blue part of the image ๐Ÿ”ต
4๏ธโƒฃ Add the results together to get one activation map!

This helps the neural network understand colors and patterns! ๐ŸŒˆ


๐Ÿ”„ 3. Multiple Input & Output Channels

Now, letโ€™s combine everything! ๐Ÿš€

  • Multiple input channels (like RGB images)
  • Multiple output channels (different filters detecting different things)

Each output channel gets its own set of kernels for each input channel.
We apply the kernels, add the results, and get multiple activation maps! ๐ŸŽฏ


๐Ÿ— Example in PyTorch

import torch.nn as nn

conv = nn.Conv2d(in_channels=3, out_channels=5, kernel_size=3)  

This means:
โœ… 3 input channels (Red, Green, Blue)
โœ… 5 output channels (5 different filters detecting different things)


๐Ÿ† Why is This Important?

โœ… Helps the neural network find different patterns ๐ŸŽจ
โœ… Works for color images and complex features ๐Ÿค–
โœ… Makes the network more powerful! ๐Ÿ’ช


๐ŸŽ‰ Great job! Now, letโ€™s try convolution with multiple channels in our own models! ๐Ÿ—๏ธ๐Ÿค–โœจ

๐ŸŽต Music Playing

๐Ÿ‘‹ Welcome! Today, weโ€™re building a CNN for MNIST! ๐Ÿ—๏ธ๐Ÿ”ข
MNIST is a dataset of handwritten numbers (0-9). โœ๏ธ๐Ÿ–ผ๏ธ


๐Ÿ— CNN Structure

๐Ÿ“ Image Size: 16ร—16 (to make training faster)
๐Ÿ”„ Layers:

  • First Convolution Layer โž 16 output channels
  • Second Convolution Layer โž 32 output channels
  • Final Layer โž 10 output neurons (one for each digit)

๐Ÿ›  Building the CNN in PyTorch

๐Ÿ“Œ Step 1: Define the CNN

import torch.nn as nn

class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=16, kernel_size=5, padding=2)  
        self.pool = nn.MaxPool2d(kernel_size=2)  
        self.conv2 = nn.Conv2d(in_channels=16, out_channels=32, kernel_size=5, padding=2)  
        self.fc = nn.Linear(32 * 4 * 4, 10)  # Fully connected layer (512 inputs, 10 outputs)

    def forward(self, x):
        x = self.pool(nn.ReLU()(self.conv1(x)))  # First layer: Conv + ReLU + Pool
        x = self.pool(nn.ReLU()(self.conv2(x)))  # Second layer: Conv + ReLU + Pool
        x = x.view(-1, 512)  # Flatten the 4x4x32 output to 1D (512 elements)
        x = self.fc(x)  # Fully connected layer for classification
        return x

๐Ÿ” Understanding the Output Shape

After Max Pooling, the image shrinks to 4ร—4 pixels.
Since we have 32 channels, the total output is:

4 ร— 4 ร— 32 = 512 elements

Each neuron in the final layer gets 512 inputs, and since we have 10 digits (0-9), we use 10 neurons.


๐Ÿ”„ Forward Step

1๏ธโƒฃ Apply First Convolution Layer โž Activation โž Max Pooling
2๏ธโƒฃ Apply Second Convolution Layer โž Activation โž Max Pooling
3๏ธโƒฃ Flatten the Output (4ร—4ร—32 โ†’ 512)
4๏ธโƒฃ Apply the Final Output Layer (10 Neurons for 10 Digits)


๐Ÿ‹๏ธโ€โ™‚๏ธ Training the Model

Check the lab to see how we train the CNN using:
โœ… Backpropagation
โœ… Stochastic Gradient Descent (SGD)
โœ… Loss Function & Accuracy Check


๐ŸŽ‰ Great job! Now, letโ€™s train our CNN to recognize handwritten digits! ๐Ÿ—๏ธ๐Ÿ”ข๐Ÿค–

๐ŸŽต Music Playing

๐Ÿ‘‹ Welcome! Today, weโ€™re learning about Convolutional Neural Networks (CNNs)! ๐Ÿค–๐Ÿ–ผ๏ธ

๐Ÿค” What is a CNN?

A Convolutional Neural Network (CNN) is a special type of neural network that understands images! ๐ŸŽจ
It learns to find patterns, like:
โœ… Edges (lines & shapes)
โœ… Textures (smooth or rough areas)
โœ… Objects (faces, animals, letters)


๐Ÿ— How Does a CNN Work?

A CNN is made of three main steps:

1๏ธโƒฃ Convolution Layer ๐Ÿ–ผ๏ธโž๐Ÿ”

  • Uses kernels (small filters) to detect patterns in an image
  • Creates an activation map that highlights important features

2๏ธโƒฃ Pooling Layer ๐Ÿ”„โž๐Ÿ“

  • Shrinks the activation map to keep only the most important parts
  • Max Pooling picks the biggest values in each small region

3๏ธโƒฃ Fully Connected Layer ๐Ÿ—๏ธโž๐ŸŽฏ

  • The final layer makes a decision (like cat ๐Ÿฑ or dog ๐Ÿถ)

๐ŸŽจ Example: Detecting Lines

We train a CNN to recognize horizontal and vertical lines:

1๏ธโƒฃ Input Image (X)
2๏ธโƒฃ First Convolution Layer

  • Uses two kernels to create two activation maps
  • Applies ReLU (activation function) to remove negative values
  • Uses Max Pooling to make learning easier

3๏ธโƒฃ Second Convolution Layer

  • Takes two input channels from the first layer
  • Uses two new kernels to create one activation map
  • Again, applies ReLU + Max Pooling

4๏ธโƒฃ Flattening โž Turns the 2D image into 1D data
5๏ธโƒฃ Final Prediction โž Uses a fully connected layer to decide:

  • 0 = Vertical Line
  • 1 = Horizontal Line

๐Ÿ”„ How to Build a CNN in PyTorch

๐Ÿ— CNN Constructor

import torch.nn as nn

class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=2, kernel_size=3, padding=1)
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
        self.conv2 = nn.Conv2d(in_channels=2, out_channels=1, kernel_size=3, padding=1)
        self.fc = nn.Linear(49, 2)  # Fully connected layer (49 inputs, 2 outputs)

    def forward(self, x):
        x = self.pool(nn.ReLU()(self.conv1(x)))  # First layer: Conv + ReLU + Pool
        x = self.pool(nn.ReLU()(self.conv2(x)))  # Second layer: Conv + ReLU + Pool
        x = x.view(-1, 49)  # Flatten to 1D
        x = self.fc(x)  # Fully connected layer
        return x

๐Ÿ‹๏ธโ€โ™‚๏ธ Training the CNN

We train the CNN using backpropagation and gradient descent:

1๏ธโƒฃ Load the dataset (images of lines) ๐Ÿ“Š
2๏ธโƒฃ Create a CNN model ๐Ÿ—๏ธ
3๏ธโƒฃ Define a loss function (to measure mistakes) โŒ
4๏ธโƒฃ Choose an optimizer (to improve learning) ๐Ÿ”„
5๏ธโƒฃ Train the model until it gets better! ๐Ÿš€

As training progresses:
๐Ÿ“‰ Loss goes down โž Model makes fewer mistakes!
๐Ÿ“ˆ Accuracy goes up โž Model gets better at predictions!


๐Ÿ† Why Use CNNs?

โœ… Finds patterns in images ๐Ÿ”
โœ… Works with real-world data (faces, animals, objects) ๐Ÿ–ผ๏ธ
โœ… More efficient than regular neural networks ๐Ÿ’ก


๐ŸŽ‰ Great job! Now, letโ€™s build and train our own CNN! ๐Ÿ—๏ธ๐Ÿค–โœจ

๐ŸŽต Music Playing

๐Ÿ‘‹ Welcome! Today, weโ€™re building a CNN for MNIST! ๐Ÿ—๏ธ๐Ÿ–ผ๏ธ
MNIST is a dataset of handwritten numbers (0-9). โœ๏ธ๐Ÿ”ข


๐Ÿ— CNN Structure

๐Ÿ“ Image Size: 16ร—16 (to make training faster)
๐Ÿ”„ Layers:

  • First Convolution Layer โž 16 output channels
  • Second Convolution Layer โž 32 output channels
  • Final Layer โž 10 output neurons (one for each digit)

๐Ÿ›  Building the CNN in PyTorch

๐Ÿ”น Step 1: Define the CNN

import torch.nn as nn

class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=16, kernel_size=5, padding=2)  
        self.pool = nn.MaxPool2d(kernel_size=2)  
        self.conv2 = nn.Conv2d(in_channels=16, out_channels=32, kernel_size=5, padding=2)  
        self.fc = nn.Linear(32 * 4 * 4, 10)  # Fully connected layer (512 inputs, 10 outputs)

    def forward(self, x):
        x = self.pool(nn.ReLU()(self.conv1(x)))  # First layer: Conv + ReLU + Pool
        x = self.pool(nn.ReLU()(self.conv2(x)))  # Second layer: Conv + ReLU + Pool
        x = x.view(-1, 512)  # Flatten the 4x4x32 output to 1D (512 elements)
        x = self.fc(x)  # Fully connected layer for classification
        return x

๐Ÿ” Understanding the Output Shape

After Max Pooling, the image shrinks to 4ร—4 pixels.
Since we have 32 channels, the total output is:

4 ร— 4 ร— 32 = 512 elements

Each neuron in the final layer gets 512 inputs, and since we have 10 digits (0-9), we use 10 neurons.


๐Ÿ”„ Forward Step

1๏ธโƒฃ Apply First Convolution Layer โž Activation โž Max Pooling
2๏ธโƒฃ Apply Second Convolution Layer โž Activation โž Max Pooling
3๏ธโƒฃ Flatten the Output (4ร—4ร—32 โ†’ 512)
4๏ธโƒฃ Apply the Final Output Layer (10 Neurons for 10 Digits)


๐Ÿ‹๏ธโ€โ™‚๏ธ Training the Model

Check the lab to see how we train the CNN using:
โœ… Backpropagation
โœ… Stochastic Gradient Descent (SGD)
โœ… Loss Function & Accuracy Check


๐ŸŽ‰ Great job! Now, letโ€™s train our CNN to recognize handwritten digits! ๐Ÿ—๏ธ๐Ÿ”ข๐Ÿค–

๐ŸŽต Music Playing

๐Ÿ‘‹ Welcome! Today, weโ€™re learning how to use Pretrained TorchVision Models! ๐Ÿค–๐Ÿ–ผ๏ธ

๐Ÿค” What is a Pretrained Model?

A pretrained model is a neural network that has already been trained by experts on a large dataset.
โœ… Saves time (no need to train from scratch) โณ
โœ… Works better (already optimized) ๐ŸŽฏ
โœ… We only train the final layer for our own images! ๐Ÿ”„


๐Ÿ”„ Using ResNet18 (A Pretrained Model)

We will use ResNet18, a powerful model trained on color images. ๐ŸŽจ
It has skip connections (we wonโ€™t go into details, but it helps learning).

We only replace the last layer to match our dataset! ๐Ÿ”


๐Ÿ›  Steps to Use a Pretrained Model

๐Ÿ“Œ Step 1: Load the Pretrained Model

import torchvision.models as models

model = models.resnet18(pretrained=True)  # Load pretrained ResNet18

๐Ÿ“Œ Step 2: Normalize Images (Required for ResNet18)

import torchvision.transforms as transforms

transform = transforms.Compose([
    transforms.Resize((224, 224)),  # Resize image
    transforms.ToTensor(),  # Convert to tensor
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])  # Normalize
])

๐Ÿ“Œ Step 3: Prepare the Dataset

Create a dataset object for your own images with training and testing data. ๐Ÿ“Š

๐Ÿ“Œ Step 4: Replace the Output Layer

  • The last hidden layer has 512 neurons
  • We create a new output layer for our dataset

Example: If we have 7 classes, we create a layer with 7 outputs:

import torch.nn as nn

for param in model.parameters():
    param.requires_grad = False  # Freeze pretrained layers

model.fc = nn.Linear(512, 7)  # Replace output layer (512 inputs โ†’ 7 outputs)

๐Ÿ‹๏ธโ€โ™‚๏ธ Training the Model

๐Ÿ“Œ Step 5: Create Data Loaders

train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=15, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=10, shuffle=False)

๐Ÿ“Œ Step 6: Set Up Training

import torch.optim as optim

criterion = nn.CrossEntropyLoss()  # Loss function
optimizer = optim.Adam(model.fc.parameters(), lr=0.001)  # Optimizer (only for last layer)

๐Ÿ“Œ Step 7: Train the Model

1๏ธโƒฃ Set model to training mode ๐Ÿ‹๏ธ

model.train()

2๏ธโƒฃ Train for 20 epochs
3๏ธโƒฃ Set model to evaluation mode when predicting ๐Ÿ“Š

model.eval()

๐Ÿ† Why Use Pretrained Models?

โœ… Saves time (no need to train from scratch)
โœ… Works better (pretrained on millions of images)
โœ… We only change one layer for our dataset!


๐ŸŽ‰ Great job! Now, try using a pretrained model for your own images! ๐Ÿ—๏ธ๐Ÿค–โœจ

๐ŸŽต Music Playing

๐Ÿ‘‹ Welcome! Today, weโ€™re learning how to use GPUs in PyTorch! ๐Ÿš€๐Ÿ’ป

๐Ÿค” Why Use a GPU?

A Graphics Processing Unit (GPU) can train models MUCH faster than a CPU!
โœ… Faster computation โฉ
โœ… Better for large datasets ๐Ÿ“Š
โœ… Helps train deep learning models efficiently ๐Ÿค–


๐Ÿ”ฅ What is CUDA?

CUDA is a special tool made by NVIDIA that allows us to use GPUs for AI tasks. ๐ŸŽฎ๐Ÿš€
In PyTorch, we use torch.cuda to work with GPUs.


๐Ÿ›  Step 1: Check if a GPU is Available

import torch

# Check if a GPU is available
torch.cuda.is_available()  # Returns True if a GPU is detected

๐ŸŽฏ Step 2: Set Up the GPU

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
  • "cuda:0" = First available GPU ๐ŸŽฎ
  • "cpu" = Use the CPU if no GPU is found

๐Ÿ— Step 3: Sending Tensors to the GPU

In PyTorch, data is stored in Tensors.
To move data to the GPU, use .to(device).

tensor = torch.randn(3, 3)  # Create a random tensor
tensor = tensor.to(device)  # Move it to the GPU

โœ… Faster processing on the GPU! โšก


๐Ÿ”„ Step 4: Using a GPU with a CNN

You donโ€™t need to change your CNN code! Just move the model to the GPU after creating it:

model = CNN()  # Create CNN model
model.to(device)  # Move the model to the GPU

This converts all layers to CUDA tensors for GPU computation! ๐ŸŽฎ


๐Ÿ‹๏ธโ€โ™‚๏ธ Step 5: Training a Model on a GPU

Training is the same, but you must send your data to the GPU!

for images, labels in train_loader:
    images, labels = images.to(device), labels.to(device)  # Move data to GPU
    optimizer.zero_grad()  # Clear gradients
    outputs = model(images)  # Forward pass (on GPU)
    loss = criterion(outputs, labels)  # Compute loss
    loss.backward()  # Backpropagation
    optimizer.step()  # Update weights

โœ… The model trains much faster! ๐Ÿš€


๐ŸŽฏ Step 6: Testing the Model

For testing, only move the images (not the labels) to the GPU:

for images, labels in test_loader:
    images = images.to(device)  # Move images to GPU
    outputs = model(images)  # Get predictions

โœ… Saves memory and speeds up testing! โšก


๐Ÿ† Summary

โœ… GPUs make training faster ๐ŸŽฎ
โœ… Use torch.cuda to work with GPUs
โœ… Move data & models to the GPU with .to(device)
โœ… Training & testing are the same, but data must be on the GPU


๐ŸŽ‰ Great job! Now, try training a model using a GPU in PyTorch! ๐Ÿ—๏ธ๐Ÿš€