๐ŸŽต **Music Playing** ๐Ÿ‘‹ **Welcome!** Today, weโ€™re learning about **Deep Neural Networks**โ€”a cool way computers learn! ๐Ÿง ๐Ÿ’ก ## ๐Ÿค– What is a Neural Network? Imagine a brain made of tiny switches called **neurons**. These neurons work together to make smart decisions! ### ๐ŸŸข Input Layer This is where we give the network information, like pictures or numbers. ### ๐Ÿ”ต Hidden Layers These layers are like **magic helpers** that figure out patterns! - More neurons = better learning ๐Ÿค“ - Too many neurons = can be **confusing** (overfitting) ๐Ÿ˜ต ### ๐Ÿ”ด Output Layer This is where the network **gives us answers!** ๐Ÿ† --- ## ๐Ÿ— Building a Deep Neural Network in PyTorch We can **build a deep neural network** using PyTorch, a tool that helps computers learn. ๐Ÿ–ฅ๏ธ ### ๐Ÿ›  Layers of Our Network 1๏ธโƒฃ **First Hidden Layer:** Has `H1` neurons. 2๏ธโƒฃ **Second Hidden Layer:** Has `H2` neurons. 3๏ธโƒฃ **Output Layer:** Decides the final answer! ๐ŸŽฏ --- ## ๐Ÿ”„ How Does It Work? 1๏ธโƒฃ **Start with an input (x).** 2๏ธโƒฃ **Pass through each layer:** - Apply **math functions** (like `sigmoid`, `tanh`, or `ReLU`). - These help the network understand better! ๐Ÿงฉ 3๏ธโƒฃ **Get the final answer!** โœ… --- ## ๐ŸŽจ Different Activation Functions Activation functions help the network **think better!** ๐Ÿง  - **Sigmoid** โž Good for small problems ๐Ÿค - **Tanh** โž Works better for deeper networks ๐ŸŒŠ - **ReLU** โž Super strong for big tasks! ๐Ÿš€ --- ## ๐Ÿ”ข Example: Recognizing Handwritten Numbers We train the network with **MNIST**, a dataset of handwritten numbers. ๐Ÿ“๐Ÿ”ข - **Input:** 784 pixels (28x28 images) ๐Ÿ“ธ - **Hidden Layers:** 50 neurons each ๐Ÿค– - **Output:** 10 neurons (digits 0-9) ๐Ÿ”Ÿ --- ## ๐Ÿš€ Training the Network We use **Stochastic Gradient Descent (SGD)** to teach the network! ๐Ÿ“š - **Loss Function:** Helps the network learn from mistakes. โŒโžกโœ… - **Validation Accuracy:** Checks how well the network is doing! ๐ŸŽฏ --- ## ๐Ÿ† What We Learned โœ… Deep Neural Networks have **many hidden layers**. โœ… Different **activation functions** help improve performance. โœ… The more layers we add, the **smarter** the network becomes! ๐Ÿ’ก --- ๐ŸŽ‰ **Great job!** Now, let's build and train our own deep neural networks! ๐Ÿ—๏ธ๐Ÿค–โœจ ----------------------------------------------------------------------------------- ๐ŸŽต **Music Playing** ๐Ÿ‘‹ **Welcome!** Today, weโ€™ll learn how to **build a deep neural network** in PyTorch using `nn.ModuleList`. ๐Ÿง ๐Ÿ’ก ## ๐Ÿค– Why Use `nn.ModuleList`? Instead of adding layers **one by one** (which takes a long time โณ), we can **automate** the process! ๐Ÿš€ --- ## ๐Ÿ— Building the Neural Network We create a **list** called `layers` ๐Ÿ“‹: - **First item:** Input size (e.g., `2` features). - **Second item:** Neurons in the **first hidden layer** (e.g., `3`). - **Third item:** Neurons in the **second hidden layer** (e.g., `4`). - **Fourth item:** Output size (number of classes, e.g., `3`). --- ## ๐Ÿ”„ Constructing the Network ### ๐Ÿ”น Step 1: Create Layers - We loop through the list, taking **two elements at a time**: - **First element:** Input size ๐ŸŽฏ - **Second element:** Output size (number of neurons) ๐Ÿงฉ ### ๐Ÿ”น Step 2: Connecting Layers - First **hidden layer** โž Input size = `2`, Neurons = `3` - Second **hidden layer** โž Input size = `3`, Neurons = `4` - **Output layer** โž Input size = `4`, Output size = `3` --- ## โšก Forward Function We **pass data** through the network: 1๏ธโƒฃ **Apply linear transformation** to each layer โž Makes calculations ๐Ÿงฎ 2๏ธโƒฃ **Apply activation function** (`ReLU`) โž Helps the network learn ๐Ÿ“ˆ 3๏ธโƒฃ **For the last layer**, we only apply **linear transformation** (since it's a classification task ๐ŸŽฏ). --- ## ๐ŸŽฏ Training the Network The **training process** is similar to before! We: - Use a **dataset** ๐Ÿ“Š - Try **different combinations** of neurons and layers ๐Ÿค– - See which setup gives the **best performance**! ๐Ÿ† --- ๐ŸŽ‰ **Awesome!** Now, letโ€™s explore ways to make these networks even **better!** ๐Ÿš€ ----------------------------------------------------------------------------------- ๐ŸŽต **Music Playing** ๐Ÿ‘‹ **Welcome!** Today, weโ€™re learning about **weight initialization** in Neural Networks! ๐Ÿง โšก ## ๐Ÿค” Why Does Weight Initialization Matter? If we **donโ€™t** choose good starting weights, our neural network **wonโ€™t learn properly**! ๐Ÿšจ Sometimes, **all neurons** in a layer get the **same weights**, which causes problems. --- ## ๐Ÿš€ How PyTorch Handles Weights PyTorch **automatically** picks starting weights, but we can also set them **ourselves**! ๐Ÿ”ง Letโ€™s see what happens when we: - Set **all weights to 1** and **bias to 0** โž โŒ **Bad idea!** - Randomly choose weights from a **uniform distribution** โž โœ… **Better!** --- ## ๐Ÿ”„ The Problem with Random Weights We use a **uniform distribution** (random values between -1 and 1). But: - **Too small?** โž Weights donโ€™t change much ๐Ÿค - **Too large?** โž **Vanishing gradient** problem ๐Ÿ˜ต ### ๐Ÿ“‰ Whatโ€™s a Vanishing Gradient? If weights are **too big**, activations get **too large**, and the **gradient shrinks to zero**. That means the network **stops learning**! ๐Ÿšซ --- ## ๐Ÿ›  Fixing the Problem ### ๐ŸŽฏ Solution: Scale Weights Based on Neurons We scale the weight range based on **how many neurons** we have: - **2 neurons?** โž Scale by **1/2** - **4 neurons?** โž Scale by **1/4** - **100 neurons?** โž Scale by **1/100** This prevents the vanishing gradient issue! โœ… --- ## ๐Ÿ”ฌ Different Weight Initialization Methods ### ๐Ÿ— **1. Default PyTorch Method** - PyTorch **automatically** picks a range: - **Lower bound:** `-1 / sqrt(L_in)` - **Upper bound:** `+1 / sqrt(L_in)` ### ๐Ÿ”ต **2. Xavier Initialization** - Best for **tanh** activation - Uses the **number of input and output neurons** - We apply `xavier_uniform_()` to set the weights ### ๐Ÿ”ด **3. He Initialization** - Best for **ReLU** activation - Uses the **He initialization method** - We apply `he_uniform_()` to set the weights --- ## ๐Ÿ† Which One is Best? We compare: โœ… **PyTorch Default** โœ… **Xavier Method** (tanh) โœ… **He Method** (ReLU) The **Xavier and He methods** help the network **learn faster**! ๐Ÿš€ --- ๐ŸŽ‰ **Great job!** Now, letโ€™s try different weight initializations and see what works best! ๐Ÿ—๏ธ๐Ÿ”ฌ ------------------------------------------------------------------------------------------------ ๐ŸŽต **Music Playing** ๐Ÿ‘‹ **Welcome!** Today, weโ€™re learning about **Gradient Descent with Momentum**! ๐Ÿš€๐Ÿ”„ ## ๐Ÿค” Whatโ€™s the Problem? Sometimes, when training a neural network, the model can get **stuck**: - **Saddle Points** โž Flat areas where learning stops ๐Ÿ”๏ธ - **Local Minima** โž Not the best solution, but we get trapped ๐Ÿ˜ž --- ## ๐Ÿƒโ€โ™‚๏ธ What is Momentum? Momentum helps the model **keep moving** even when it gets stuck! ๐Ÿ’จ Itโ€™s like rolling a ball downhill: - **Gradient (Force)** โž Tells us where to go ๐Ÿ€ - **Momentum (Mass)** โž Helps us keep moving even on flat surfaces โšก --- ## ๐Ÿ”„ How Does It Work? ### ๐Ÿ”น Step 1: Compute Velocity - Velocity (`v`) = Old velocity (`v_k`) + Learning step (`gradient * learning rate`) - The **momentum term** (๐œŒ) controls how much we keep from the past. ### ๐Ÿ”น Step 2: Update Weights - New weight (`w_k+1`) = Old weight (`w_k`) - Learning rate * Velocity The bigger the **momentum**, the harder it is to stop moving! ๐Ÿƒโ€โ™‚๏ธ๐Ÿ’จ --- ## โš ๏ธ Why Does It Help? ### ๐Ÿ”๏ธ **Saddle Points** - **Without Momentum** โž Model **stops** moving in flat areas โŒ - **With Momentum** โž Keeps moving **past** the flat spots โœ… ### โฌ‡ **Local Minima** - **Without Momentum** โž Gets **stuck** in a bad spot ๐Ÿ˜– - **With Momentum** โž Pushes through and **finds a better solution!** ๐ŸŽฏ --- ## ๐Ÿ† Picking the Right Momentum - **Too Small?** โž Model gets **stuck** ๐Ÿ˜• - **Too Large?** โž Model **overshoots** the best answer ๐Ÿš€ - **Best Choice?** โž We test different values and pick what works! ๐Ÿ”ฌ --- ## ๐Ÿ›  Using Momentum in PyTorch Just add the **momentum** value to the optimizer! ```python optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.5) ``` In the lab, we test **different momentum values** on a dataset and see how they affect learning! ๐Ÿ“Š --- ๐ŸŽ‰ **Great job!** Now, letโ€™s experiment with momentum and see how it helps our model! ๐Ÿ—๏ธโšก ------------------------------------------------------------------------------------------ ๐ŸŽต **Music Playing** ๐Ÿ‘‹ **Welcome!** Today, weโ€™re learning about **Batch Normalization**! ๐Ÿš€๐Ÿ”„ ## ๐Ÿค” Whatโ€™s the Problem? When training a neural network, the activations (outputs) can vary a lot, making learning **slower** and **unstable**. ๐Ÿ˜– Batch Normalization **fixes this** by: โœ… Making activations more consistent โœ… Helping the network learn faster โœ… Reducing problems like vanishing gradients --- ## ๐Ÿ”„ How Does Batch Normalization Work? ### ๐Ÿ— Step 1: Normalize Each Mini-Batch For each neuron in a layer: 1๏ธโƒฃ Compute the **mean** and **standard deviation** of its activations. ๐Ÿ“Š 2๏ธโƒฃ Normalize the outputs using: \[ z' = \frac{z - \text{mean}}{\text{std dev} + \epsilon} \] (We add a **small** value `ฮต` to avoid division by zero.) ### ๐Ÿ— Step 2: Scale and Shift - Instead of leaving activations at 0 and 1, we **scale** and **shift** them: \[ z'' = \gamma \cdot z' + \beta \] - **ฮณ (scale) and ฮฒ (shift)** are **learned** during training! ๐Ÿ‹๏ธโ€โ™‚๏ธ --- ## ๐Ÿ”ฌ Example: Normalizing Activations - **First Mini-Batch (X1)** โž Compute mean & std for each neuron, normalize, then scale & shift - **Second Mini-Batch (X2)** โž Repeat for new batch! โ™ป - **Next Layer** โž Apply batch normalization again! ๐Ÿ”„ ### ๐Ÿ† Prediction Time - During **training**, we compute the mean & std for **each batch**. - During **testing**, we use the **population mean & std** instead. ๐Ÿ“Š --- ## ๐Ÿ›  Using Batch Normalization in PyTorch ```python import torch.nn as nn class NeuralNetwork(nn.Module): def __init__(self): super(NeuralNetwork, self).__init__() self.fc1 = nn.Linear(10, 3) # First layer (10 inputs, 3 neurons) self.bn1 = nn.BatchNorm1d(3) # Batch Norm for first layer self.fc2 = nn.Linear(3, 4) # Second layer (3 inputs, 4 neurons) self.bn2 = nn.BatchNorm1d(4) # Batch Norm for second layer def forward(self, x): x = self.bn1(self.fc1(x)) # Apply Batch Norm x = self.bn2(self.fc2(x)) # Apply Batch Norm again return x ``` - **Training?** Set the model to **train mode** ๐Ÿ‹๏ธโ€โ™‚๏ธ ```python model.train() ``` - **Predicting?** Use **evaluation mode** ๐Ÿ“ˆ ```python model.eval() ``` --- ## ๐Ÿš€ Why Does Batch Normalization Work? ### โœ… Helps Gradient Descent Work Better - Normalized data = **smoother** loss function ๐ŸŽฏ - Gradients point in the **right** direction = Faster learning! ๐Ÿš€ ### โœ… Reduces Vanishing Gradient Problem - Sigmoid & Tanh activations suffer from small gradients ๐Ÿ˜ข - Normalization **keeps activations in a good range** ๐Ÿ“Š ### โœ… Allows Higher Learning Rates - Networks can **train faster** without getting unstable โฉ ### โœ… Reduces Need for Dropout - Some studies show **Batch Norm can replace Dropout** ๐Ÿคฏ --- ๐ŸŽ‰ **Great job!** Now, letโ€™s try batch normalization in our own models! ๐Ÿ—๏ธ๐Ÿ“ˆ