Titans-OpenELM-1_1B

Titanesque version of apple/OpenELM-1_1B with parallel linearized attention (TPTT 😊) and PEFT.

The model was presented in the paper TPTT.

Model Details

  • Architecture: TpttModel
  • Base model: apple/OpenELM-1_1B
  • LiZA config: operator=delta_rule, mag=0.5
  • LoRA config: r=8, alpha=16, dropout=0.05
  • torch_dtype: bfloat16

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
"ffurfaro/Titans-OpenELM-1_1B",
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained("ffurfaro/Titans-OpenELM-1_1B")

prompt = "Your prompt here"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs, skip_special_tokens=True))

Training

  • Dataset: yahma/alpaca-cleaned
  • Platform: Kaggle
  • Hardware: NVIDIA 2xT4
  • Batch size: 3
  • Epochs: 5.0
  • Learning rate (final): 1.1904761904761906e-06
  • Loss (final): 1.3188
  • Training runtime: 1651.0658 sec
  • Samples per second: 1.514
  • Steps per second: 0.254
  • Total FLOPs: 5852956262400000.0
  • Gradient norm (final): 0.7039350271224976

Evaluation

  • Metrics: Training loss only (no eval yet, table soon : PiQA, ARC, Hella, Wino, GSM8K, MMLU)
  • Results: Final training loss: 1.3188

Citation & Contact

If you use TPTT in your academic work, please cite Furfaro. For questions or support, please open an issue on the GitHub repository or contact the maintainer.


Downloads last month
57
Safetensors
Model size
1.08B params
Tensor type
F32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ffurfaro/Titans-OpenELM-1_1B

Base model

apple/OpenELM-1_1B
Finetuned
(15)
this model

Dataset used to train ffurfaro/Titans-OpenELM-1_1B