tFINE-680m-e32-d16-infinity_instruct-L2

this is an instruction-tuned version of a pretrained t5 with GQA.

Model description

This model is a fine-tuned version of BEE-spoke-data/tFINE-680m-e32-d16-infinity_instruct-L1 on the pszemraj/infinity-instruct-7m-T2T_en dataset (config deduped-L2).

It achieves the following results on the evaluation set:

  • Loss: 1.3139
  • Num Input Tokens Seen: 361724696

usage

prerequisite: you need to have t5-gqa fork of transformers installed, and accelerate.

from transformers import pipeline

pipe = pipeline(
    "text2text-generation",
    model="BEE-spoke-data/tFINE-680m-e32-d16-infinity_instruct-L2",
    device_map="auto",
)
prompt = "Write me a python fn that demonstrates an advanced sorting algorithm"
res = pipe(
    prompt, max_new_tokens=384, num_beams=4, early_stopping=True, repetition_penalty=1.1
)
print(res[0]["generated_text"])

Quick eval

Quick eval for: BEE-spoke-data/tFINE-680m-e32-d16-infinity_instruct-L2

hf (pretrained=BEE-spoke-data/tFINE-680m-e32-d16-infinity_instruct-L2,trust_remote_code=True,dtype=bfloat16,trust_remote_code=True), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 8

Tasks Version Filter n-shot Metric Value Stderr
boolq 2 none 0 acc ↑ 0.6364 ± 0.0084
openbookqa 1 none 0 acc ↑ 0.1480 ± 0.0159
none 0 acc_norm ↑ 0.2860 ± 0.0202
piqa 1 none 0 acc ↑ 0.6083 ± 0.0114
none 0 acc_norm ↑ 0.6132 ± 0.0114
social_iqa 0 none 0 acc ↑ 0.3854 ± 0.0110
tinyArc 0 none 25 acc_norm ↑ 0.3122 ± N/A
tinyHellaswag 0 none 10 acc_norm ↑ 0.3356 ± N/A
tinyMMLU 0 none 0 acc_norm ↑ 0.2793 ± N/A
winogrande 1 none 0 acc ↑ 0.5201 ± 0.0140

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2.5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 17868
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 256
  • total_eval_batch_size: 8
  • optimizer: Use paged_ademamix_32bit and the args are: No additional optimizer arguments
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.02
  • num_epochs: 1.0

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
1.4008 0.2534 1000 1.4020 91375832
1.3456 0.5068 2000 1.3669 182939052
1.3437 0.7602 3000 1.3378 274855796
Downloads last month
4
Safetensors
Model size
680M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for BEE-spoke-data/tFINE-680m-e32-d16-infinity_instruct-L2

Dataset used to train BEE-spoke-data/tFINE-680m-e32-d16-infinity_instruct-L2