Titans-Llama-3.2-1B

Titanesque version of meta-llama/Llama-3.2-1B with parallel linearized attention (TPTT 😊) and PEFT.

The model was presented in the paper TPTT.

Model Details

  • Architecture: TpttModel
  • Base model: meta-llama/Llama-3.2-1B
  • LiZA config: operator=delta_rule, mag=0.5
  • LoRA config: r=8, alpha=16, dropout=0.05
  • torch_dtype: bfloat16

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
"ffurfaro/Titans-Llama-3.2-1B",
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained("ffurfaro/Titans-Llama-3.2-1B")

prompt = "Your prompt here"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs, skip_special_tokens=True))

Training

  • Dataset: yahma/alpaca-cleaned
  • Platform: Kaggle
  • Hardware: NVIDIA 2xT4
  • Batch size: 3
  • Epochs: 5.0
  • Learning rate (final): 1.1904761904761906e-06
  • Loss (final): 1.375
  • Training runtime: 1654.117 sec
  • Samples per second: 1.511
  • Steps per second: 0.254
  • Total FLOPs: 5615136276480000.0
  • Gradient norm (final): 2.6798148155212402

Evaluation

  • Metrics: Training loss only (no eval yet, table soon : PiQA, ARC, Hella, Wino, GSM8K, MMLU)
  • Results: Final training loss: 1.375

Citation & Contact

If you use TPTT in your academic work, please cite Furfaro. For questions or support, please open an issue on the GitHub repository or contact the maintainer.


Downloads last month
382
Safetensors
Model size
1.24B params
Tensor type
F32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ffurfaro/Titans-Llama-3.2-1B

Finetuned
(429)
this model

Dataset used to train ffurfaro/Titans-Llama-3.2-1B