Titans-OLMo-1B-hf

Titanesque version of allenai/OLMo-1B-hf with parallel linearized attention (TPTT 😊) and PEFT.

The model was presented in the paper TPTT.

For code, see https://github.com/fabienfrfr/tptt

Model Details

  • Architecture: TpttModel
  • Base model: allenai/OLMo-1B-hf
  • LiZA config: operator=delta_rule, mag=0.5
  • LoRA config: r=8, alpha=16, dropout=0.05
  • torch_dtype: bfloat16

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
"ffurfaro/Titans-OLMo-1B-hf",
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained("ffurfaro/Titans-OLMo-1B-hf")

prompt = "Your prompt here"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs, skip_special_tokens=True))

Training

  • Dataset: yahma/alpaca-cleaned
  • Platform: Kaggle
  • Hardware: NVIDIA 2xT4
  • Batch size: 3
  • Epochs: 5.0
  • Learning rate (final): 1.1904761904761906e-06
  • Loss (final): 1.3068
  • Training runtime: 1585.151 sec
  • Samples per second: 1.577
  • Steps per second: 0.265
  • Total FLOPs: 6196832501760000.0
  • Gradient norm (final): 3.117034673690796

Evaluation

  • Metrics: Training loss only (no eval yet, table soon : PiQA, ARC, Hella, Wino, GSM8K, MMLU)
  • Results: Final training loss: 1.3068

Citation & Contact

If you use TPTT in your academic work, please cite Furfaro. For questions or support, please open an issue on the GitHub repository or contact the maintainer.


Downloads last month
16
Safetensors
Model size
1.18B params
Tensor type
F32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ffurfaro/Titans-OLMo-1B-hf

Base model

allenai/OLMo-1B-hf
Finetuned
(5)
this model

Dataset used to train ffurfaro/Titans-OLMo-1B-hf