Together

company

Verified

https://together.ai

togethercompute

togethercomputer

Inference Provider

2,776,159 monthly requests

AI & ML interests

Foundation Models, Decentralized Computing, Open Source AI.

Papers

Untied Ulysses: Memory-Efficient Context Parallelism via Headwise Chunking

View all Papers

Articles

Fine-tune Any LLM from the Hugging Face Hub with Together AI

posted an update 7 months ago

Post

332

🚀 Full-Quality Wan2.2 Video Generation on a single 24GB GPU — Powered by DFloat11

We just released the DFloat11 compressed Wan2.2 models. Now you can run full-quality Wan2.2 video generation on a single 24GB GPU, thanks to DFloat11 compression and CPU offloading.

🔗 Image-to-Video: DFloat11/Wan2.2-I2V-A14B-DF11
🔗 Text-to-Video: DFloat11/Wan2.2-T2V-A14B-DF11

authored 11 papers 7 months ago

FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU

Paper • 2303.06865 • Published Mar 13, 2023 • 1

Auto-Differentiation of Relational Computations for Very Large Scale Machine Learning

Paper • 2306.00088 • Published May 31, 2023 • 1

Holistic Evaluation of Language Models

Paper • 2211.09110 • Published Nov 16, 2022 • 1

Locret: Enhancing Eviction in Long-Context LLM Inference with Trained Retaining Heads

Paper • 2410.01805 • Published Oct 2, 2024

Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild

Paper • 2410.05357 • Published Oct 7, 2024

Zero-Indexing Internet Search Augmented Generation for Large Language Models

Paper • 2411.19478 • Published Nov 29, 2024

HEXGEN-TEXT2SQL: Optimizing LLM Inference Request Scheduling for Agentic Text-to-SQL Workflow

Paper • 2505.05286 • Published May 8, 2025 • 1

AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning

Paper • 2505.24298 • Published May 30, 2025 • 30

Hallucination at a Glance: Controlled Visual Edits and Fine-Grained Multimodal Learning

Paper • 2506.07227 • Published Jun 8, 2025

Multi-Step Visual Reasoning with Visual Tokens Scaling and Verification

Paper • 2506.07235 • Published Jun 8, 2025 • 3

Re:Form -- Reducing Human Priors in Scalable Formal Software Verification with RL in LLMs: A Preliminary Study on Dafny

Paper • 2507.16331 • Published Jul 22, 2025 • 22

authored a paper 11 months ago

70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float

Paper • 2504.11651 • Published Apr 15, 2025 • 31

authored a paper about 1 year ago

METAGENE-1: Metagenomic Foundation Model for Pandemic Monitoring

Paper • 2501.02045 • Published Jan 3, 2025 • 22

authored a paper over 1 year ago

RedPajama: an Open Dataset for Training Large Language Models

Paper • 2411.12372 • Published Nov 19, 2024 • 57

authored a paper over 1 year ago

RedPajama: an Open Dataset for Training Large Language Models

Paper • 2411.12372 • Published Nov 19, 2024 • 57

authored 4 papers over 1 year ago

Feedback-Based Self-Learning in Large-Scale Conversational AI Agents

Paper • 1911.02557 • Published Nov 6, 2019

A Vocabulary-Free Multilingual Neural Tokenizer for End-to-End Task Learning

Paper • 2204.10815 • Published Apr 22, 2022

Self-Aware Feedback-Based Self-Learning in Large-Scale Conversational AI

Paper • 2205.00029 • Published Apr 29, 2022

Training-Free Activation Sparsity in Large Language Models

Paper • 2408.14690 • Published Aug 26, 2024