Spaces:

CARROT-LLM-Routing
/

README

Running

File size: 3,111 Bytes

f0b8a78
 
 
 
 
 
 
 
 
b50ffc0
0ca1f40
510a014
343da50
80a919c
510a014
08486eb
510a014
25c3208
510a014
de5151f
510a014
 
 
 
08486eb
510a014
 
c9499eb
 
 
 
510a014
08486eb
18a2cd4
 
b50ffc0
4c396fe
5985c5e
71742a8
 
 
 
 
77fe375
 
466c6fc
 
e7004e6
8f7b9b5
e7004e6
6edc748
77fe375
466c6fc
cd67c04
77fe375
466c6fc
cd67c04
77fe375
466c6fc
cd67c04
77fe375
169eda2
110cb4e

---
title: README
emoji: 📈
colorFrom: yellow
colorTo: green
sdk: static
pinned: false
---


<div class="grid lg:grid-cols-2 gap-x-4 gap-y-7">
	<p class="lg:col-span-3">
	  Welcome to CARROT-LLM-Routing! For a given trade off between performance and cost, 
      CARROT makes it easy to pick the best model among a set of 13 LLMs for any query. Below you may read the CARROT paper, access code for CARROT, or see how to utilize CARROT out of the box for routing.
	</p>
	<a href="https://arxiv.org/abs/2502.03261" class="block overflow-hidden group">
		<div
			class="w-40 h-39 object-cover mb-2 rounded-lg flex items-center justify-center bg-[#ECFAFF]"
		>
			<img alt="" src="fmselect_gpt4o_comparison.png" class="w-40" />
		</div>
		<div class="underline">Read the paper</div>
	</a>
	<a
		href="https://github.com/somerstep/CARROT"
		class="block overflow-hidden"
	>
		<div
			class="w-40 h-39 object-cover mb-2 rounded-lg flex items-center justify-center bg-[#ECFAFF]"
		>
			<img alt="" src="logo.png" class="w-40" />
		</div>
		<div class="underline">Access code for CARROT</div>
      </a>
      
<p class="lg:col-span-3">
As is, CARROT supports routing to the collection of large language models provided in the table below. Instantiating the CarrotRouter class automatically loads the trained predictors for ouput token count and performance that are hosted in the CARROT-LLM-Router model repositories. Note that you must provide a hugging face token with access to the Llama-3 herd of models. To control the cost performance tradeoff please provide the router with an argument between 0 and 1 for mu. A smaller mu will prioritize performance. Happy routing!

|                      | claude-3-5-sonnet-v1 | titan-text-premier-v1 | openai-gpt-4o | openai-gpt-4o-mini | granite-3-2b-instruct | granite-3-8b-instruct | llama-3-1-70b-instruct | llama-3-1-8b-instruct | llama-3-2-1b-instruct | llama-3-2-3b-instruct | llama-3-3-70b-instruct | mixtral-8x7b-instruct | llama-3-405b-instruct |
|----------------------|---------------------|----------------------|---------------|--------------------|----------------------|----------------------|----------------------|----------------------|----------------------|----------------------|----------------------|----------------------|----------------------|
| **Input Token Cost ($ per 1M tokens)**  | 3   | 0.5  | 2.5  | 0.15  | 0.1  | 0.2  | 0.9  | 0.2  | 0.06  | 0.06  | 0.9  | 0.6  | 3.5  |
| **Output Token Cost ($ per 1M tokens)** | 15  | 1.5  | 10   | 0.6   | 0.1  | 0.2  | 0.9  | 0.2  | 0.06  | 0.06  | 0.9  | 0.6  | 3.5  |

<div class="p-4 bg-gray-100 rounded-lg shadow-md">
    <p><strong>Example: Using CARROT for Routing</strong></p>

```python
## Download carrot.py
!git clone https://github.com/somerstep/CARROT.git
%cd CARROT-LLM-Router
from carrot import CarrotRouter

# Initialize the router
router = CarrotRouter(hf_token='YOUR_HF_TOKEN')

# Define a query
query = ["What is the value of i^i?"]

# Get the best model for cost-performance tradeoff
best_model = router.route(query, mu = 0.3)

print(f"Recommended Model: {best_model[0]}")
```