sebastiansarasti
/

DDPM_MNIST2

model_hub_mixin

pytorch_model_hub_mixin

Model card Files Files and versions Community

DDPM_MNIST2 / README.md

sebastiansarasti's picture

sebastiansarasti

Update README.md

41e7405 verified 5 months ago

|

history blame contribute delete

3.74 kB

	---
	tags:
	- model_hub_mixin
	- pytorch_model_hub_mixin
	---

	# Model Card: Time-Conditioned U-Net for MNIST

	## Model Details
	- Architecture: Time-Conditioned U-Net
	- Dataset: [Comic Faces Paired Synthetic](https://www.kaggle.com/datasets/defileroff/comic-faces-paired-synthetic)
	- Batch Size: 256
	- Image Size: 28x28
	- Loss Function: Mean Squared Error (MSE)
	- Optimizer: Adam (learning rate = 1e-4)

	## Model Architecture
	This model is a U-Net-based neural network that incorporates time conditioning using sinusoidal embeddings and an MLP. The architecture is designed for small grayscale images (e.g., MNIST) and consists of:

	### Encoder (Contracting Path):
	- Downsampling using three `DoubleConv` layers with 32, 64, and 128 channels, respectively.
	- Time embedding added at each convolution block.
	- Max pooling used to reduce spatial dimensions.

	### Decoder (Expanding Path):
	- Upsampling via bilinear interpolation.
	- Skip connections from encoder layers to corresponding decoder layers.
	- Two `DoubleConv` layers with 128+64 and 64+32 channels, respectively.
	- Final `1x1` convolution to map to the output.

	### Time Embedding:
	- Uses a sinusoidal positional encoding to represent timestep information.
	- An MLP refines the embedding before passing it to convolutional layers.

	## Implementation
	### Generator (U-Net)
	```python
	class UNet(nn.Module, PyTorchModelHubMixin):
	def __init__(self, in_channels=1, out_channels=1, time_embedding_dim=32):
	super(UNet, self).__init__()

	# Time embedding layer
	self.time_embedding = TimeEmbedding(time_embedding_dim)

	# Encoder
	self.down_conv1 = DoubleConv(in_channels, 32, time_embedding_dim)
	self.down_conv2 = DoubleConv(32, 64, time_embedding_dim)
	self.down_conv3 = DoubleConv(64, 128, time_embedding_dim)

	self.maxpool = nn.MaxPool2d(kernel_size=2, stride=2)
	self.upsample = nn.Upsample(scale_factor=2, mode="bilinear", align_corners=True)

	# Decoder
	self.up_conv2 = DoubleConv(128 + 64, 64, time_embedding_dim)
	self.up_conv1 = DoubleConv(64 + 32, 32, time_embedding_dim)
	self.final_conv = nn.Conv2d(32, out_channels, kernel_size=1)

	def forward(self, x, timesteps):
	t = self.time_embedding(timesteps)

	x1 = self.down_conv1(x, t)
	x2 = self.down_conv2(self.maxpool(x1), t)
	x3 = self.down_conv3(self.maxpool(x2), t)

	x = self.upsample(x3)
	x = torch.cat([x2, x], dim=1)
	x = self.up_conv2(x, t)

	x = self.upsample(x)
	x = torch.cat([x1, x], dim=1)
	x = self.up_conv1(x, t)

	return self.final_conv(x)
	```
	Time Embedding
	```python
	class TimeEmbedding(nn.Module):
	def __init__(self, embedding_dim):
	super().__init__()
	self.mlp = nn.Sequential(
	nn.SiLU(),
	nn.Linear(embedding_dim, embedding_dim),
	)

	def forward(self, t):
	half_dim = self.embedding_dim // 2
	embeddings = torch.exp(torch.arange(half_dim, device=t.device) * -(torch.log(torch.tensor(10000.0)) / (half_dim - 1)))
	embeddings = t[:, None] * embeddings[None, :]
	embeddings = torch.cat((embeddings.sin(), embeddings.cos()), dim=-1)
	return self.mlp(embeddings)
	```

	## Training Configuration
	- Batch Size: 256
	- Image Size: 28x28
	- Loss Function: Mean Squared Error (MSE)
	- Optimizer: Adam (learning rate = 1e-4)

	This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
	- Library: [More Information Needed]
	- Docs: [More Information Needed]