Update README.md
Browse files
README.md
CHANGED
@@ -4,6 +4,98 @@ tags:
|
|
4 |
- pytorch_model_hub_mixin
|
5 |
---
|
6 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
7 |
This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
|
8 |
- Library: [More Information Needed]
|
9 |
- Docs: [More Information Needed]
|
|
|
4 |
- pytorch_model_hub_mixin
|
5 |
---
|
6 |
|
7 |
+
# Model Card: Time-Conditioned U-Net for MNIST
|
8 |
+
|
9 |
+
## Model Details
|
10 |
+
- **Architecture**: Time-Conditioned U-Net
|
11 |
+
- **Dataset**: [Comic Faces Paired Synthetic](https://www.kaggle.com/datasets/defileroff/comic-faces-paired-synthetic)
|
12 |
+
- **Batch Size**: 256
|
13 |
+
- **Image Size**: 28x28
|
14 |
+
- **Loss Function**: Mean Squared Error (MSE)
|
15 |
+
- **Optimizer**: Adam (learning rate = 1e-4)
|
16 |
+
|
17 |
+
## Model Architecture
|
18 |
+
This model is a U-Net-based neural network that incorporates time conditioning using sinusoidal embeddings and an MLP. The architecture is designed for small grayscale images (e.g., MNIST) and consists of:
|
19 |
+
|
20 |
+
### **Encoder (Contracting Path)**:
|
21 |
+
- **Downsampling** using three `DoubleConv` layers with 32, 64, and 128 channels, respectively.
|
22 |
+
- Time embedding added at each convolution block.
|
23 |
+
- **Max pooling** used to reduce spatial dimensions.
|
24 |
+
|
25 |
+
### **Decoder (Expanding Path)**:
|
26 |
+
- **Upsampling** via bilinear interpolation.
|
27 |
+
- Skip connections from encoder layers to corresponding decoder layers.
|
28 |
+
- Two `DoubleConv` layers with 128+64 and 64+32 channels, respectively.
|
29 |
+
- Final `1x1` convolution to map to the output.
|
30 |
+
|
31 |
+
### **Time Embedding**:
|
32 |
+
- Uses a sinusoidal positional encoding to represent timestep information.
|
33 |
+
- An MLP refines the embedding before passing it to convolutional layers.
|
34 |
+
|
35 |
+
## Implementation
|
36 |
+
### **Generator (U-Net)**
|
37 |
+
```python
|
38 |
+
class UNet(nn.Module, PyTorchModelHubMixin):
|
39 |
+
def __init__(self, in_channels=1, out_channels=1, time_embedding_dim=32):
|
40 |
+
super(UNet, self).__init__()
|
41 |
+
|
42 |
+
# Time embedding layer
|
43 |
+
self.time_embedding = TimeEmbedding(time_embedding_dim)
|
44 |
+
|
45 |
+
# Encoder
|
46 |
+
self.down_conv1 = DoubleConv(in_channels, 32, time_embedding_dim)
|
47 |
+
self.down_conv2 = DoubleConv(32, 64, time_embedding_dim)
|
48 |
+
self.down_conv3 = DoubleConv(64, 128, time_embedding_dim)
|
49 |
+
|
50 |
+
self.maxpool = nn.MaxPool2d(kernel_size=2, stride=2)
|
51 |
+
self.upsample = nn.Upsample(scale_factor=2, mode="bilinear", align_corners=True)
|
52 |
+
|
53 |
+
# Decoder
|
54 |
+
self.up_conv2 = DoubleConv(128 + 64, 64, time_embedding_dim)
|
55 |
+
self.up_conv1 = DoubleConv(64 + 32, 32, time_embedding_dim)
|
56 |
+
self.final_conv = nn.Conv2d(32, out_channels, kernel_size=1)
|
57 |
+
|
58 |
+
def forward(self, x, timesteps):
|
59 |
+
t = self.time_embedding(timesteps)
|
60 |
+
|
61 |
+
x1 = self.down_conv1(x, t)
|
62 |
+
x2 = self.down_conv2(self.maxpool(x1), t)
|
63 |
+
x3 = self.down_conv3(self.maxpool(x2), t)
|
64 |
+
|
65 |
+
x = self.upsample(x3)
|
66 |
+
x = torch.cat([x2, x], dim=1)
|
67 |
+
x = self.up_conv2(x, t)
|
68 |
+
|
69 |
+
x = self.upsample(x)
|
70 |
+
x = torch.cat([x1, x], dim=1)
|
71 |
+
x = self.up_conv1(x, t)
|
72 |
+
|
73 |
+
return self.final_conv(x)
|
74 |
+
```
|
75 |
+
Time Embedding
|
76 |
+
```python
|
77 |
+
class TimeEmbedding(nn.Module):
|
78 |
+
def __init__(self, embedding_dim):
|
79 |
+
super().__init__()
|
80 |
+
self.mlp = nn.Sequential(
|
81 |
+
nn.SiLU(),
|
82 |
+
nn.Linear(embedding_dim, embedding_dim),
|
83 |
+
)
|
84 |
+
|
85 |
+
def forward(self, t):
|
86 |
+
half_dim = self.embedding_dim // 2
|
87 |
+
embeddings = torch.exp(torch.arange(half_dim, device=t.device) * -(torch.log(torch.tensor(10000.0)) / (half_dim - 1)))
|
88 |
+
embeddings = t[:, None] * embeddings[None, :]
|
89 |
+
embeddings = torch.cat((embeddings.sin(), embeddings.cos()), dim=-1)
|
90 |
+
return self.mlp(embeddings)
|
91 |
+
```
|
92 |
+
|
93 |
+
## Training Configuration
|
94 |
+
- Batch Size: 256
|
95 |
+
- Image Size: 28x28
|
96 |
+
- Loss Function: Mean Squared Error (MSE)
|
97 |
+
- Optimizer: Adam (learning rate = 1e-4)
|
98 |
+
|
99 |
This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
|
100 |
- Library: [More Information Needed]
|
101 |
- Docs: [More Information Needed]
|