|
--- |
|
library_name: transformers |
|
pipeline_tag: fill-mask |
|
tags: [gpt-bert, babylm, remote-code] |
|
license: other |
|
--- |
|
# jumelet/gptbert-pol-250steps-base |
|
|
|
GPT-BERT style BabyBabyLLM model for language **pol**. |
|
|
|
This repository may include both *main* and *EMA* variants. |
|
|
|
**Default variant exposed to generic loaders:** `ema` |
|
|
|
## Variants Available |
|
ema, main |
|
|
|
## Files |
|
- model.safetensors (alias of default variant) |
|
- model_ema.safetensors |
|
- pytorch_model.bin (legacy PyTorch format) |
|
- pol-2gpu-250steps.bin (raw training checkpoint) |
|
- pol-2gpu-250steps_ema.bin (raw training checkpoint) |
|
|
|
## Configuration |
|
```json |
|
{ |
|
"attention_probs_dropout_prob": 0.1, |
|
"hidden_dropout_prob": 0.1, |
|
"hidden_size": 768, |
|
"intermediate_size": 2560, |
|
"max_position_embeddings": 512, |
|
"position_bucket_size": 32, |
|
"num_attention_heads": 12, |
|
"num_hidden_layers": 12, |
|
"vocab_size": 16384, |
|
"layer_norm_eps": 1e-05, |
|
"force_causal_mask": true, |
|
"classifier_dropout": 0.1, |
|
"classifier_layer_norm_eps": 1e-05, |
|
"num_labels": 2 |
|
} |
|
``` |
|
Tokenizer file: `tokenizer_pol_vs16384.json` |
|
|
|
## Quick Usage |
|
```python |
|
from transformers import AutoTokenizer, AutoModelForMaskedLM |
|
model_id = 'jumelet/gptbert-pol-250steps-base' |
|
tok = AutoTokenizer.from_pretrained(model_id) |
|
model = AutoModelForMaskedLM.from_pretrained(model_id, trust_remote_code=True) |
|
out = model(**tok('Hello world', return_tensors='pt')) |
|
``` |
|
|
|
### Forced Causal Attention |
|
Causal attention is enforced during inference by applying a triangular future mask inside the remote code. |
|
This prevents the hybrid GPT-BERT layers from attending to future tokens even when a bidirectional mask is provided. |
|
|
|
### Sequence Classification |
|
`GPTBertForSequenceClassification` mirrors the original GLUE classifier head for downstream fine-tuning. |
|
```python |
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
model_id = 'jumelet/gptbert-pol-250steps-base' |
|
tok = AutoTokenizer.from_pretrained(model_id) |
|
model = AutoModelForSequenceClassification.from_pretrained(model_id, trust_remote_code=True) |
|
outputs = model(**tok('This movie was great!', return_tensors='pt')) |
|
print(outputs.logits) |
|
``` |
|
|
|
## Notes |
|
- Converted on 2025-10-07T01:14:48.240581+00:00 |
|
- Weights are the exact trained parameters; no new layers were initialized. |
|
- Requires `trust_remote_code=True` due to custom architecture. |
|
|