--- license: apache-2.0 datasets: - stanfordnlp/imdb language: - en base_model: - prajjwal1/bert-tiny pipeline_tag: fill-mask library_name: transformers tags: - BERT - Optuna --- ## Overview This model was fine-tuned using **Optuna-based hyperparameter optimization** on a downstream NLP task with the Hugging Face Transformers library. The objective was to systematically search for optimal training configurations (e.g., learning rate, weight decay, batch size) to maximize model performance on the validation set. | **Recipe Source** | [Hugging Face Cookbook: Optuna HPO with Transformers](https://huggingface.co/learn/cookbook/optuna_hpo_with_transformers#hyperparameter-optimization-with-optuna-and-transformers) | | ----------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | **Frameworks** | Transformers, Optuna, PyTorch | | **Task** | Text classification (can generalize to other supervised NLP tasks) | --- Poster ### Supported Tasks ✅ Text classification ✅ Token classification (NER) ✅ Sequence-to-sequence (if adapted) ✅ Any model supported by Transformers’ Trainer API --- ## Hyperparameter Search Space The Optuna study explored: * **Learning rate:** LogUniform(5e-6, 5e-4) * **Weight decay:** Uniform(0.0, 0.3) * **Per device train batch size:** Choice([8, 16, 32]) ## Optimization Objective The pipeline optimizes: * **Metric:** Validation accuracy (can switch to F1, loss, or task-specific metrics) * **Direction:** Maximize ## Best Trial Example (MRPC) | Hyperparameter | Best Value | | ------------------- | ---------- | | Learning rate | \~2.3e-5 | | Weight decay | \~0.18 | | Batch size | 16 | | Validation Accuracy | \~88% | *Note: Results vary by random seed and compute budget.* --- See full example in the [Hugging Face Cookbook Recipe](https://huggingface.co/learn/cookbook/optuna_hpo_with_transformers#hyperparameter-optimization-with-optuna-and-transformers). ---