sebastiansarasti commited on
Commit
41e7405
·
verified ·
1 Parent(s): 8f36890

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +92 -0
README.md CHANGED
@@ -4,6 +4,98 @@ tags:
4
  - pytorch_model_hub_mixin
5
  ---
6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
8
  - Library: [More Information Needed]
9
  - Docs: [More Information Needed]
 
4
  - pytorch_model_hub_mixin
5
  ---
6
 
7
+ # Model Card: Time-Conditioned U-Net for MNIST
8
+
9
+ ## Model Details
10
+ - **Architecture**: Time-Conditioned U-Net
11
+ - **Dataset**: [Comic Faces Paired Synthetic](https://www.kaggle.com/datasets/defileroff/comic-faces-paired-synthetic)
12
+ - **Batch Size**: 256
13
+ - **Image Size**: 28x28
14
+ - **Loss Function**: Mean Squared Error (MSE)
15
+ - **Optimizer**: Adam (learning rate = 1e-4)
16
+
17
+ ## Model Architecture
18
+ This model is a U-Net-based neural network that incorporates time conditioning using sinusoidal embeddings and an MLP. The architecture is designed for small grayscale images (e.g., MNIST) and consists of:
19
+
20
+ ### **Encoder (Contracting Path)**:
21
+ - **Downsampling** using three `DoubleConv` layers with 32, 64, and 128 channels, respectively.
22
+ - Time embedding added at each convolution block.
23
+ - **Max pooling** used to reduce spatial dimensions.
24
+
25
+ ### **Decoder (Expanding Path)**:
26
+ - **Upsampling** via bilinear interpolation.
27
+ - Skip connections from encoder layers to corresponding decoder layers.
28
+ - Two `DoubleConv` layers with 128+64 and 64+32 channels, respectively.
29
+ - Final `1x1` convolution to map to the output.
30
+
31
+ ### **Time Embedding**:
32
+ - Uses a sinusoidal positional encoding to represent timestep information.
33
+ - An MLP refines the embedding before passing it to convolutional layers.
34
+
35
+ ## Implementation
36
+ ### **Generator (U-Net)**
37
+ ```python
38
+ class UNet(nn.Module, PyTorchModelHubMixin):
39
+ def __init__(self, in_channels=1, out_channels=1, time_embedding_dim=32):
40
+ super(UNet, self).__init__()
41
+
42
+ # Time embedding layer
43
+ self.time_embedding = TimeEmbedding(time_embedding_dim)
44
+
45
+ # Encoder
46
+ self.down_conv1 = DoubleConv(in_channels, 32, time_embedding_dim)
47
+ self.down_conv2 = DoubleConv(32, 64, time_embedding_dim)
48
+ self.down_conv3 = DoubleConv(64, 128, time_embedding_dim)
49
+
50
+ self.maxpool = nn.MaxPool2d(kernel_size=2, stride=2)
51
+ self.upsample = nn.Upsample(scale_factor=2, mode="bilinear", align_corners=True)
52
+
53
+ # Decoder
54
+ self.up_conv2 = DoubleConv(128 + 64, 64, time_embedding_dim)
55
+ self.up_conv1 = DoubleConv(64 + 32, 32, time_embedding_dim)
56
+ self.final_conv = nn.Conv2d(32, out_channels, kernel_size=1)
57
+
58
+ def forward(self, x, timesteps):
59
+ t = self.time_embedding(timesteps)
60
+
61
+ x1 = self.down_conv1(x, t)
62
+ x2 = self.down_conv2(self.maxpool(x1), t)
63
+ x3 = self.down_conv3(self.maxpool(x2), t)
64
+
65
+ x = self.upsample(x3)
66
+ x = torch.cat([x2, x], dim=1)
67
+ x = self.up_conv2(x, t)
68
+
69
+ x = self.upsample(x)
70
+ x = torch.cat([x1, x], dim=1)
71
+ x = self.up_conv1(x, t)
72
+
73
+ return self.final_conv(x)
74
+ ```
75
+ Time Embedding
76
+ ```python
77
+ class TimeEmbedding(nn.Module):
78
+ def __init__(self, embedding_dim):
79
+ super().__init__()
80
+ self.mlp = nn.Sequential(
81
+ nn.SiLU(),
82
+ nn.Linear(embedding_dim, embedding_dim),
83
+ )
84
+
85
+ def forward(self, t):
86
+ half_dim = self.embedding_dim // 2
87
+ embeddings = torch.exp(torch.arange(half_dim, device=t.device) * -(torch.log(torch.tensor(10000.0)) / (half_dim - 1)))
88
+ embeddings = t[:, None] * embeddings[None, :]
89
+ embeddings = torch.cat((embeddings.sin(), embeddings.cos()), dim=-1)
90
+ return self.mlp(embeddings)
91
+ ```
92
+
93
+ ## Training Configuration
94
+ - Batch Size: 256
95
+ - Image Size: 28x28
96
+ - Loss Function: Mean Squared Error (MSE)
97
+ - Optimizer: Adam (learning rate = 1e-4)
98
+
99
  This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
100
  - Library: [More Information Needed]
101
  - Docs: [More Information Needed]