Titans-v2-OLMoE-1B-7B-0924
Titanesque version of allenai/OLMoE-1B-7B-0924
with parallel linearized attention (TPTT ๐) and PEFT.
The architecture was presented in the paper TPTT.
Model list
Classic model parameter with LiZA injection :
Subfolder | Max Self Attn Length | Mag Weight | Cross Gate | Max Chunk Size | Bidirectional | LoRA | Description |
---|---|---|---|---|---|---|---|
delta_rule | 8192 (default) | 0.5 | False | 64 | False | Yes | Parallel linearized attention with delta_rule operator |
delta_rule_gelu | 8192 (default) | 0.5 | False | 64 | False | Yes | Non-linear operator with gelu activation |
delta_product | 8192 (default) | 0.5 | False | 64 | False | Yes | Second order operator with derivative trick |
delta_product_r | 8192 (default) | 0.5 | False | 64 | False | Yes | Second order operator with rotative trick |
delta_product_c | 8192 (default) | 0.5 | False | 64 | False | Yes | Second order operator with combined trick |
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"ffurfaro/Titans-v2-OLMoE-1B-7B-0924",
subfolder="tptt_subfolder", # see in repo tree
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained("ffurfaro/allenai/OLMoE-1B-7B-0924")
prompt = "Your prompt here"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs, skip_special_tokens=True))
Citation & Contact
If you use TPTT in your academic work, please cite Furfaro. For questions or support, please open an issue on the GitHub repository or contact the maintainer.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for ffurfaro/Titans-v2-OLMoE-1B-7B-0924
Base model
allenai/OLMoE-1B-7B-0924