Titans-OLMo-1B-hf
Titanesque version of allenai/OLMo-1B-hf
with parallel linearized attention (TPTT 😊) and PEFT.
The model was presented in the paper TPTT.
For code, see https://github.com/fabienfrfr/tptt
Model Details
- Architecture: TpttModel
- Base model: allenai/OLMo-1B-hf
- LiZA config: operator=delta_rule, mag=0.5
- LoRA config: r=8, alpha=16, dropout=0.05
- torch_dtype: bfloat16
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"ffurfaro/Titans-OLMo-1B-hf",
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained("ffurfaro/Titans-OLMo-1B-hf")
prompt = "Your prompt here"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs, skip_special_tokens=True))
Training
- Dataset: yahma/alpaca-cleaned
- Platform: Kaggle
- Hardware: NVIDIA 2xT4
- Batch size: 3
- Epochs: 5.0
- Learning rate (final): 1.1904761904761906e-06
- Loss (final): 1.3068
- Training runtime: 1585.151 sec
- Samples per second: 1.577
- Steps per second: 0.265
- Total FLOPs: 6196832501760000.0
- Gradient norm (final): 3.117034673690796
Evaluation
- Metrics: Training loss only (no eval yet, table soon : PiQA, ARC, Hella, Wino, GSM8K, MMLU)
- Results: Final training loss: 1.3068
Citation & Contact
If you use TPTT in your academic work, please cite Furfaro. For questions or support, please open an issue on the GitHub repository or contact the maintainer.
- Downloads last month
- 16
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for ffurfaro/Titans-OLMo-1B-hf
Base model
allenai/OLMo-1B-hf