Spaces:
Sleeping
A newer version of the Gradio SDK is available:
5.34.2
l๐ต Music Playing
๐ Welcome! Today, weโre learning about Convolution in Neural Networks! ๐ง ๐ผ๏ธ
๐ค What is Convolution?
Convolution helps computers understand pictures by looking at patterns instead of exact positions! ๐ผ๏ธ๐
Imagine you have two images that look almost the same, but one is a little moved.
A computer might think they are totally different! ๐ฒ
Convolution fixes this problem! โ
๐ ๏ธ How Convolution Works
We use something called a kernel (a small filter ๐ฒ) that slides over an image.
It checks different parts of the picture and creates a new image called an activation map!
1๏ธโฃ The image is a grid of numbers ๐ผ๏ธ
2๏ธโฃ The kernel is a small grid ๐ณ that moves across the image
3๏ธโฃ It multiplies numbers in the image with the numbers in the kernel โ๏ธ
4๏ธโฃ The results are added together โ
5๏ธโฃ We move to the next spot and repeat! ๐
6๏ธโฃ The final result is the activation map ๐ฏ
๐ How Big is the Activation Map?
The size of the activation map depends on:
- M (image size) ๐
- K (kernel size) ๐ณ
- Stride (how far the kernel moves) ๐ฃ
Formula:
New size = (Image size - Kernel size) + 1
Example:
- 4ร4 image ๐ท
- 2ร2 kernel ๐ณ
- Activation map = 3ร3 โ
๐ฃ What is Stride?
Stride is how far the kernel moves each time!
- Stride = 1 โ Moves one step at a time ๐ข
- Stride = 2 โ Moves two steps at a time ๐ถโโ๏ธ
- Bigger stride = Smaller activation map! ๐
๐ What is Zero Padding?
Sometimes, the kernel doesnโt fit perfectly in the image. ๐
So, we add extra rows and columns of zeros around the image! 0๏ธโฃ0๏ธโฃ0๏ธโฃ
This makes sure the kernel covers everything! โ
Formula:
New Image Size = Old Size + 2 ร Padding
๐จ What About Color Images?
For black & white images, we use Conv2D with one channel (grayscale). ๐
For color images, we use three channels (Red, Green, Blue - RGB)! ๐จ๐
๐ Summary
โ
Convolution helps computers find patterns in images!
โ
We use a kernel to create an activation map!
โ
Stride & padding change how the convolution works!
โ
This is how computers "see" images! ๐๐ค
๐ Great job! Now, letโs try convolution in the lab! ๐๏ธ๐คโจ
๐ต Music Playing
๐ Welcome! Today, weโre learning about Activation Functions and Max Pooling! ๐๐ข
๐ค What is an Activation Function?
Activation functions help a neural network decide whatโs important! ๐ง
They change the values in the activation map to help the model learn better.
๐ฅ Example: ReLU Activation Function
1๏ธโฃ We take an input image ๐ผ๏ธ
2๏ธโฃ We apply convolution to create an activation map ๐
3๏ธโฃ We apply ReLU (Rectified Linear Unit):
- If a value is negative โ Change it to 0 โ
- If a value is positive โ Keep it โ
๐ Example Calculation
Before ReLU | After ReLU |
---|---|
-4 | 0 |
0 | 0 |
4 | 4 |
All negative numbers become zero! โจ
In PyTorch, we apply the ReLU function after convolution:
import torch.nn.functional as F
output = F.relu(conv_output)
๐ What is Max Pooling?
Max Pooling helps the network focus on important details while making images smaller! ๐๐
๐ How It Works
1๏ธโฃ We divide the image into small regions (e.g., 2ร2 squares)
2๏ธโฃ We keep only the largest value in each region
3๏ธโฃ We move the window and repeat until weโve covered the whole image
๐ Example: 2ร2 Max Pooling
Before Pooling | After Pooling |
---|---|
1, 6, 2, 3 | 6, 8 |
5, 8, 7, 4 | 9, 7 |
9, 2, 3, 7 |
Only the biggest number in each section is kept! โ
๐ Why Use Max Pooling?
โ
Reduces image size โ Makes training faster! ๐
โ
Ignores small changes in images โ More stable results! ๐
โ
Helps find important features in the picture! ๐ผ๏ธ
In PyTorch, we apply Max Pooling like this:
import torch.nn.functional as F
output = F.max_pool2d(activation_map, kernel_size=2, stride=2)
๐ Great job! Now, letโs try using activation functions and max pooling in our own models! ๐๏ธ๐คโจ
๐ต Music Playing
๐ Welcome! Today, weโre learning about Convolution with Multiple Channels! ๐ผ๏ธ๐ค
๐ค Whatโs a Channel?
A channel is like a layer of an image! ๐
- Black & White Images โ 1 channel (grayscale) ๐ณ๏ธ
- Color Images โ 3 channels (Red, Green, Blue - RGB) ๐จ
Neural networks see images by looking at these channels separately! ๐
๐ฏ 1. Multiple Output Channels
Usually, we use one kernel to create one activation map ๐
But what if we want to detect different things in an image? ๐ค
- Solution: Use multiple kernels! Each kernel finds different features! ๐
๐ฅ Example: Detecting Lines
1๏ธโฃ A vertical line kernel finds vertical edges ๐
2๏ธโฃ A horizontal line kernel finds horizontal edges ๐
More kernels = More ways to see the image! ๐โ
๐จ 2. Multiple Input Channels
Color images have 3 channels (Red, Green, Blue).
To process them, we use a separate kernel for each channel! ๐จ
1๏ธโฃ Apply a Red kernel to the Red part of the image ๐ด
2๏ธโฃ Apply a Green kernel to the Green part of the image ๐ข
3๏ธโฃ Apply a Blue kernel to the Blue part of the image ๐ต
4๏ธโฃ Add the results together to get one activation map!
This helps the neural network understand colors and patterns! ๐
๐ 3. Multiple Input & Output Channels
Now, letโs combine everything! ๐
- Multiple input channels (like RGB images)
- Multiple output channels (different filters detecting different things)
Each output channel gets its own set of kernels for each input channel.
We apply the kernels, add the results, and get multiple activation maps! ๐ฏ
๐ Example in PyTorch
import torch.nn as nn
conv = nn.Conv2d(in_channels=3, out_channels=5, kernel_size=3)
This means:
โ
3 input channels (Red, Green, Blue)
โ
5 output channels (5 different filters detecting different things)
๐ Why is This Important?
โ
Helps the neural network find different patterns ๐จ
โ
Works for color images and complex features ๐ค
โ
Makes the network more powerful! ๐ช
๐ Great job! Now, letโs try convolution with multiple channels in our own models! ๐๏ธ๐คโจ
๐ต Music Playing
๐ Welcome! Today, weโre building a CNN for MNIST! ๐๏ธ๐ข
MNIST is a dataset of handwritten numbers (0-9). โ๏ธ๐ผ๏ธ
๐ CNN Structure
๐ Image Size: 16ร16 (to make training faster)
๐ Layers:
- First Convolution Layer โ 16 output channels
- Second Convolution Layer โ 32 output channels
- Final Layer โ 10 output neurons (one for each digit)
๐ Building the CNN in PyTorch
๐ Step 1: Define the CNN
import torch.nn as nn
class CNN(nn.Module):
def __init__(self):
super(CNN, self).__init__()
self.conv1 = nn.Conv2d(in_channels=1, out_channels=16, kernel_size=5, padding=2)
self.pool = nn.MaxPool2d(kernel_size=2)
self.conv2 = nn.Conv2d(in_channels=16, out_channels=32, kernel_size=5, padding=2)
self.fc = nn.Linear(32 * 4 * 4, 10) # Fully connected layer (512 inputs, 10 outputs)
def forward(self, x):
x = self.pool(nn.ReLU()(self.conv1(x))) # First layer: Conv + ReLU + Pool
x = self.pool(nn.ReLU()(self.conv2(x))) # Second layer: Conv + ReLU + Pool
x = x.view(-1, 512) # Flatten the 4x4x32 output to 1D (512 elements)
x = self.fc(x) # Fully connected layer for classification
return x
๐ Understanding the Output Shape
After Max Pooling, the image shrinks to 4ร4 pixels.
Since we have 32 channels, the total output is:
4 ร 4 ร 32 = 512 elements
Each neuron in the final layer gets 512 inputs, and since we have 10 digits (0-9), we use 10 neurons.
๐ Forward Step
1๏ธโฃ Apply First Convolution Layer โ Activation โ Max Pooling
2๏ธโฃ Apply Second Convolution Layer โ Activation โ Max Pooling
3๏ธโฃ Flatten the Output (4ร4ร32 โ 512)
4๏ธโฃ Apply the Final Output Layer (10 Neurons for 10 Digits)
๐๏ธโโ๏ธ Training the Model
Check the lab to see how we train the CNN using:
โ
Backpropagation
โ
Stochastic Gradient Descent (SGD)
โ
Loss Function & Accuracy Check
๐ Great job! Now, letโs train our CNN to recognize handwritten digits! ๐๏ธ๐ข๐ค
๐ต Music Playing
๐ Welcome! Today, weโre learning about Convolutional Neural Networks (CNNs)! ๐ค๐ผ๏ธ
๐ค What is a CNN?
A Convolutional Neural Network (CNN) is a special type of neural network that understands images! ๐จ
It learns to find patterns, like:
โ
Edges (lines & shapes)
โ
Textures (smooth or rough areas)
โ
Objects (faces, animals, letters)
๐ How Does a CNN Work?
A CNN is made of three main steps:
1๏ธโฃ Convolution Layer ๐ผ๏ธโ๐
- Uses kernels (small filters) to detect patterns in an image
- Creates an activation map that highlights important features
2๏ธโฃ Pooling Layer ๐โ๐
- Shrinks the activation map to keep only the most important parts
- Max Pooling picks the biggest values in each small region
3๏ธโฃ Fully Connected Layer ๐๏ธโ๐ฏ
- The final layer makes a decision (like cat ๐ฑ or dog ๐ถ)
๐จ Example: Detecting Lines
We train a CNN to recognize horizontal and vertical lines:
1๏ธโฃ Input Image (X)
2๏ธโฃ First Convolution Layer
- Uses two kernels to create two activation maps
- Applies ReLU (activation function) to remove negative values
- Uses Max Pooling to make learning easier
3๏ธโฃ Second Convolution Layer
- Takes two input channels from the first layer
- Uses two new kernels to create one activation map
- Again, applies ReLU + Max Pooling
4๏ธโฃ Flattening โ Turns the 2D image into 1D data
5๏ธโฃ Final Prediction โ Uses a fully connected layer to decide:
0
= Vertical Line1
= Horizontal Line
๐ How to Build a CNN in PyTorch
๐ CNN Constructor
import torch.nn as nn
class CNN(nn.Module):
def __init__(self):
super(CNN, self).__init__()
self.conv1 = nn.Conv2d(in_channels=1, out_channels=2, kernel_size=3, padding=1)
self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
self.conv2 = nn.Conv2d(in_channels=2, out_channels=1, kernel_size=3, padding=1)
self.fc = nn.Linear(49, 2) # Fully connected layer (49 inputs, 2 outputs)
def forward(self, x):
x = self.pool(nn.ReLU()(self.conv1(x))) # First layer: Conv + ReLU + Pool
x = self.pool(nn.ReLU()(self.conv2(x))) # Second layer: Conv + ReLU + Pool
x = x.view(-1, 49) # Flatten to 1D
x = self.fc(x) # Fully connected layer
return x
๐๏ธโโ๏ธ Training the CNN
We train the CNN using backpropagation and gradient descent:
1๏ธโฃ Load the dataset (images of lines) ๐
2๏ธโฃ Create a CNN model ๐๏ธ
3๏ธโฃ Define a loss function (to measure mistakes) โ
4๏ธโฃ Choose an optimizer (to improve learning) ๐
5๏ธโฃ Train the model until it gets better! ๐
As training progresses:
๐ Loss goes down โ Model makes fewer mistakes!
๐ Accuracy goes up โ Model gets better at predictions!
๐ Why Use CNNs?
โ
Finds patterns in images ๐
โ
Works with real-world data (faces, animals, objects) ๐ผ๏ธ
โ
More efficient than regular neural networks ๐ก
๐ Great job! Now, letโs build and train our own CNN! ๐๏ธ๐คโจ
๐ต Music Playing
๐ Welcome! Today, weโre building a CNN for MNIST! ๐๏ธ๐ผ๏ธ
MNIST is a dataset of handwritten numbers (0-9). โ๏ธ๐ข
๐ CNN Structure
๐ Image Size: 16ร16 (to make training faster)
๐ Layers:
- First Convolution Layer โ 16 output channels
- Second Convolution Layer โ 32 output channels
- Final Layer โ 10 output neurons (one for each digit)
๐ Building the CNN in PyTorch
๐น Step 1: Define the CNN
import torch.nn as nn
class CNN(nn.Module):
def __init__(self):
super(CNN, self).__init__()
self.conv1 = nn.Conv2d(in_channels=1, out_channels=16, kernel_size=5, padding=2)
self.pool = nn.MaxPool2d(kernel_size=2)
self.conv2 = nn.Conv2d(in_channels=16, out_channels=32, kernel_size=5, padding=2)
self.fc = nn.Linear(32 * 4 * 4, 10) # Fully connected layer (512 inputs, 10 outputs)
def forward(self, x):
x = self.pool(nn.ReLU()(self.conv1(x))) # First layer: Conv + ReLU + Pool
x = self.pool(nn.ReLU()(self.conv2(x))) # Second layer: Conv + ReLU + Pool
x = x.view(-1, 512) # Flatten the 4x4x32 output to 1D (512 elements)
x = self.fc(x) # Fully connected layer for classification
return x
๐ Understanding the Output Shape
After Max Pooling, the image shrinks to 4ร4 pixels.
Since we have 32 channels, the total output is:
4 ร 4 ร 32 = 512 elements
Each neuron in the final layer gets 512 inputs, and since we have 10 digits (0-9), we use 10 neurons.
๐ Forward Step
1๏ธโฃ Apply First Convolution Layer โ Activation โ Max Pooling
2๏ธโฃ Apply Second Convolution Layer โ Activation โ Max Pooling
3๏ธโฃ Flatten the Output (4ร4ร32 โ 512)
4๏ธโฃ Apply the Final Output Layer (10 Neurons for 10 Digits)
๐๏ธโโ๏ธ Training the Model
Check the lab to see how we train the CNN using:
โ
Backpropagation
โ
Stochastic Gradient Descent (SGD)
โ
Loss Function & Accuracy Check
๐ Great job! Now, letโs train our CNN to recognize handwritten digits! ๐๏ธ๐ข๐ค
๐ต Music Playing
๐ Welcome! Today, weโre learning how to use Pretrained TorchVision Models! ๐ค๐ผ๏ธ
๐ค What is a Pretrained Model?
A pretrained model is a neural network that has already been trained by experts on a large dataset.
โ
Saves time (no need to train from scratch) โณ
โ
Works better (already optimized) ๐ฏ
โ
We only train the final layer for our own images! ๐
๐ Using ResNet18 (A Pretrained Model)
We will use ResNet18, a powerful model trained on color images. ๐จ
It has skip connections (we wonโt go into details, but it helps learning).
We only replace the last layer to match our dataset! ๐
๐ Steps to Use a Pretrained Model
๐ Step 1: Load the Pretrained Model
import torchvision.models as models
model = models.resnet18(pretrained=True) # Load pretrained ResNet18
๐ Step 2: Normalize Images (Required for ResNet18)
import torchvision.transforms as transforms
transform = transforms.Compose([
transforms.Resize((224, 224)), # Resize image
transforms.ToTensor(), # Convert to tensor
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) # Normalize
])
๐ Step 3: Prepare the Dataset
Create a dataset object for your own images with training and testing data. ๐
๐ Step 4: Replace the Output Layer
- The last hidden layer has 512 neurons
- We create a new output layer for our dataset
Example: If we have 7 classes, we create a layer with 7 outputs:
import torch.nn as nn
for param in model.parameters():
param.requires_grad = False # Freeze pretrained layers
model.fc = nn.Linear(512, 7) # Replace output layer (512 inputs โ 7 outputs)
๐๏ธโโ๏ธ Training the Model
๐ Step 5: Create Data Loaders
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=15, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=10, shuffle=False)
๐ Step 6: Set Up Training
import torch.optim as optim
criterion = nn.CrossEntropyLoss() # Loss function
optimizer = optim.Adam(model.fc.parameters(), lr=0.001) # Optimizer (only for last layer)
๐ Step 7: Train the Model
1๏ธโฃ Set model to training mode ๐๏ธ
model.train()
2๏ธโฃ Train for 20 epochs
3๏ธโฃ Set model to evaluation mode when predicting ๐
model.eval()
๐ Why Use Pretrained Models?
โ
Saves time (no need to train from scratch)
โ
Works better (pretrained on millions of images)
โ
We only change one layer for our dataset!
๐ Great job! Now, try using a pretrained model for your own images! ๐๏ธ๐คโจ
๐ต Music Playing
๐ Welcome! Today, weโre learning how to use GPUs in PyTorch! ๐๐ป
๐ค Why Use a GPU?
A Graphics Processing Unit (GPU) can train models MUCH faster than a CPU!
โ
Faster computation โฉ
โ
Better for large datasets ๐
โ
Helps train deep learning models efficiently ๐ค
๐ฅ What is CUDA?
CUDA is a special tool made by NVIDIA that allows us to use GPUs for AI tasks. ๐ฎ๐
In PyTorch, we use torch.cuda to work with GPUs.
๐ Step 1: Check if a GPU is Available
import torch
# Check if a GPU is available
torch.cuda.is_available() # Returns True if a GPU is detected
๐ฏ Step 2: Set Up the GPU
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
"cuda:0"
= First available GPU ๐ฎ"cpu"
= Use the CPU if no GPU is found
๐ Step 3: Sending Tensors to the GPU
In PyTorch, data is stored in Tensors.
To move data to the GPU, use .to(device)
.
tensor = torch.randn(3, 3) # Create a random tensor
tensor = tensor.to(device) # Move it to the GPU
โ Faster processing on the GPU! โก
๐ Step 4: Using a GPU with a CNN
You donโt need to change your CNN code! Just move the model to the GPU after creating it:
model = CNN() # Create CNN model
model.to(device) # Move the model to the GPU
This converts all layers to CUDA tensors for GPU computation! ๐ฎ
๐๏ธโโ๏ธ Step 5: Training a Model on a GPU
Training is the same, but you must send your data to the GPU!
for images, labels in train_loader:
images, labels = images.to(device), labels.to(device) # Move data to GPU
optimizer.zero_grad() # Clear gradients
outputs = model(images) # Forward pass (on GPU)
loss = criterion(outputs, labels) # Compute loss
loss.backward() # Backpropagation
optimizer.step() # Update weights
โ The model trains much faster! ๐
๐ฏ Step 6: Testing the Model
For testing, only move the images (not the labels) to the GPU:
for images, labels in test_loader:
images = images.to(device) # Move images to GPU
outputs = model(images) # Get predictions
โ Saves memory and speeds up testing! โก
๐ Summary
โ
GPUs make training faster ๐ฎ
โ
Use torch.cuda to work with GPUs
โ
Move data & models to the GPU with .to(device)
โ
Training & testing are the same, but data must be on the GPU
๐ Great job! Now, try training a model using a GPU in PyTorch! ๐๏ธ๐