Titans-Qwen2.5-1.5B

Titanesque version of Qwen/Qwen2.5-1.5B with parallel linearized attention (TPTT 😊) and PEFT.

The model was presented in the paper TPTT.

Model Details

  • Architecture: TpttModel
  • Base model: Qwen/Qwen2.5-1.5B
  • LiZA config: operator=delta_rule, mag=0.5
  • LoRA config: r=8, alpha=16, dropout=0.05
  • torch_dtype: bfloat16

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
"ffurfaro/Titans-Qwen2.5-1.5B",
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained("ffurfaro/Titans-Qwen2.5-1.5B")

prompt = "Your prompt here"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs, skip_special_tokens=True))

Training

  • Dataset: yahma/alpaca-cleaned
  • Platform: Kaggle
  • Hardware: NVIDIA 2xT4
  • Batch size: 3
  • Epochs: 5.0
  • Learning rate (final): 1.1904761904761906e-06
  • Loss (final): 1.2568
  • Training runtime: 1900.6452 sec
  • Samples per second: 1.315
  • Steps per second: 0.221
  • Total FLOPs: 7560113356800000.0
  • Gradient norm (final): 1.9852564334869385

Evaluation

  • Metrics: Training loss only (no eval yet, table soon : PiQA, ARC, Hella, Wino, GSM8K, MMLU)
  • Results: Final training loss: 1.2568

Citation & Contact

If you use TPTT in your academic work, please cite Furfaro. For questions or support, please open an issue on the GitHub repository or contact the maintainer.


Downloads last month
41
Safetensors
Model size
1.55B params
Tensor type
F32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ffurfaro/Titans-Qwen2.5-1.5B

Base model

Qwen/Qwen2.5-1.5B
Finetuned
(141)
this model

Dataset used to train ffurfaro/Titans-Qwen2.5-1.5B