Upload folder using huggingface_hub

Browse files

Files changed (13) hide show

.gitattributes +10 -0
Arch-Router-1.5B-Q2_K.gguf +3 -0
Arch-Router-1.5B-Q3_K_L.gguf +3 -0
Arch-Router-1.5B-Q3_K_M.gguf +3 -0
Arch-Router-1.5B-Q3_K_S.gguf +3 -0
Arch-Router-1.5B-Q4_K_M.gguf +3 -0
Arch-Router-1.5B-Q4_K_S.gguf +3 -0
Arch-Router-1.5B-Q5_K_M.gguf +3 -0
Arch-Router-1.5B-Q5_K_S.gguf +3 -0
Arch-Router-1.5B.gguf +3 -0
Arch-Router-Q6_K.gguf +3 -0
LICENSE +77 -0
README.md +165 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,13 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+Arch-Router-1.5B-Q2_K.gguf filter=lfs diff=lfs merge=lfs -text
+Arch-Router-1.5B-Q3_K_L.gguf filter=lfs diff=lfs merge=lfs -text
+Arch-Router-1.5B-Q3_K_M.gguf filter=lfs diff=lfs merge=lfs -text
+Arch-Router-1.5B-Q3_K_S.gguf filter=lfs diff=lfs merge=lfs -text
+Arch-Router-1.5B-Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
+Arch-Router-1.5B-Q4_K_S.gguf filter=lfs diff=lfs merge=lfs -text
+Arch-Router-1.5B-Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
+Arch-Router-1.5B-Q5_K_S.gguf filter=lfs diff=lfs merge=lfs -text
+Arch-Router-1.5B.gguf filter=lfs diff=lfs merge=lfs -text
+Arch-Router-Q6_K.gguf filter=lfs diff=lfs merge=lfs -text

Arch-Router-1.5B-Q2_K.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cd68f063804bc3a18cfbc6af2e46686c98ae96cbcbae3d861149f0676a40bbf6
+size 676304320

Arch-Router-1.5B-Q3_K_L.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:07cd34508d7b91660b291f2feb76edba02a99c2f74204326fa0e3ddcb6b9cf1f
+size 880162240

Arch-Router-1.5B-Q3_K_M.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8870132d79e824dd2039edf4fcd145243b1af4ba72cc6b856930d45c6f5289d6
+size 824178112

Arch-Router-1.5B-Q3_K_S.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9cd467cf2de36d0c8cd35b91e9a8a8ce5bfb8678125c274ef233e706d52ee083
+size 760944064

Arch-Router-1.5B-Q4_K_M.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b55666691f14089cf4422b443431e2b783033d99dce6c7df147565192b4deb76
+size 986047936

Arch-Router-1.5B-Q4_K_S.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a135240351dfeb50555a6e779b25d1d1e9917ecd8ee499ca94457d24eadce13b
+size 940312000

Arch-Router-1.5B-Q5_K_M.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:866fd984ec7a4e59449f34f7d2fbf160875281733b0c7d313213de9b95f43245
+size 1125049792

Arch-Router-1.5B-Q5_K_S.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6fb3f313139f00b54deb2823d99de8cb4809357b4a81fc367d56ee171e607798
+size 1098728896

Arch-Router-1.5B.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ece850eba74acbf864baddccfd78278ae2178c8460e15c3b60dbf1ac73dbeee8
+size 3093668800

Arch-Router-Q6_K.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:29c06d7ade24db29c11996f15bbc6333e401b8ddad8669d310846c1b2267aff1
+size 1272739264

LICENSE ADDED Viewed

	@@ -0,0 +1,77 @@

+# Katanemo Labs, Inc. COMMUNITY LICENSE AGREEMENT
+**Version Release Date:** September 30th, 2024
+This Katanemo Labs, Inc. COMMUNITY LICENSE AGREEMENT is based on the Llama 3.2 Community License, Copyright © Meta Platforms, Inc. The terms and conditions have been adapted to reflect the proprietary nature of Katanemo Labs' materials.
+---
+1.Definitions
+  a. "Agreement": The terms and conditions for use, reproduction, distribution, and modification of the Katanemo Materials set forth herein.
+  b. "Documentation": The specifications, manuals, and documentation accompanying Katanemo LLMs v1.
+  c. "Licensee" or "you: The individual or entity entering into this Agreement, including your employer if you are acting on their behalf.
+  d. "Katanemo": The foundational large language models and software provided by Katanemo Labs, Inc., available at https://huggingface.co/katanemolabs.
+  e. "Katanemo Materials": Collectively, Katanemo's proprietary models and Documentation. Some Materials are derived from the Qwen language models licensed under the Qwen RESEARCH LICENSE AGREEMENT.
+  f. "Katanemo Labs" or "we": Katanemo Labs Inc., a Delaware, USA Corporation.
+---
+2.
+By clicking "I Accept" or using any part of the Katanemo Materials, you agree to be bound by this Agreement.
+---
+3. License Rights and Redistribution
+  a. Grant of Rights
+  You are granted a non-exclusive, worldwide, non-transferable, and royalty-free license to:
+- Use, reproduce, distribute, and modify the Katanemo Materials.
+- Create derivative works based on the Katanemo Materials.
+4. Redistribution and Use
+  a. Distribution:
+     If you distribute the Katanemo Materials or a derivative work:
+   - Include a copy of this Agreement.
+   - Prominently display "Built with Katanemo" on a related website or documentation.
+  b. Attribution:
+   Include the following attribution notice:
+   "Katanemo is licensed under the Katanemo Labs Community License, Copyright © Katanemo Labs, Inc. All Rights Reserved."_
+  c. Compliance:
+   Your use must adhere to the Acceptable Use Policy, available at https://katanemolabs.com/katanemo/use-policy.
+---
+5. Additional Commercial Terms
+If you are commercially using the Materials, you shall request a license from us.
+---
+6. Disclaimer of Warranty
+The Katanemo Materials are provided "AS IS" without warranties of any kind, either express or implied, including but not limited to warranties of title, non-infringement, or fitness for a particular purpose.
+---
+7. Limitation of Liability
+Katanemo Labs is not liable for any indirect, special, or consequential damages arising out of the use of the Katanemo Materials, even if advised of the possibility of such damages.
+---
+8. Intellectual Property
+  a. Trademarks
+  No trademark licenses are granted, except as required for attribution as described in Section 1.b. You may use the “Katanemo” mark according to Katanemo Labs' brand guidelines.
+  b. Ownership
+  You own any derivative works or modifications you create, except for portions owned by Katanemo Labs.
+  c. Litigation
+  If you file a lawsuit against Katanemo Labs regarding intellectual property, your license under this Agreement terminates.
+---
+9. Term and Termination
+This Agreement continues until terminated. Katanemo Labs may terminate the Agreement if you breach any terms. Upon termination, you must cease using the Katanemo Materials.
+---
+10. Governing Law and Jurisdiction
+This Agreement is governed by the laws of the State of Washington, USA. Any disputes will be resolved in the courts of California.

README.md ADDED Viewed

	@@ -0,0 +1,165 @@

+---
+license: other
+license_name: katanemo-research
+license_link: >-
+  https://huggingface.co/katanemo/Arch-Router-1.5B.gguf/blob/main/LICENSE
+base_model:
+- Qwen/Qwen2.5-1.5B-Instruct
+language:
+- en
+pipeline_tag: text-generation
+library_name: transformers
+---
+# katanemo/Arch-Router-1.5B
+## Overview
+Arch-Router is a **preference-based routing model** designed to intelligently select the most appropriate large language model (LLM) for a given prompt by leveraging a structured **domain–action taxonomy**. This framework enables fine-grained control over model selection by aligning user-defined preferences with model capabilities across a wide range of tasks and subject areas.
+### How It Works
+To support effective routing, Arch-Router introduces two key concepts:
+- **Domain** – the high-level thematic category or subject matter of a request (e.g., legal, healthcare, programming).
+- **Action** – the specific type of operation the user wants performed (e.g., summarization, code generation, booking appointment, translation).
+Both domain and action configs are associated with preferred models or model variants. At inference time, Arch-Router analyzes the incoming prompt to infer its domain and action using semantic similarity, task indicators, and contextual cues. It then applies the user-defined routing preferences to select the model best suited to handle the request.
+### Key Features
+- **Structured Preference Routing**: Aligns prompt request with model strengths using explicit domain–action mappings.
+- **Transparent and Controllable**: Makes routing decisions transparent and configurable, empowering users to customize system behavior.
+- **Flexible and Adaptive**: Supports evolving user needs, model updates, and new domains/actions without retraining the router.
+- **Production-Ready Performance**: Optimized for low-latency, high-throughput applications in multi-model environments.
+Arch-Router powers the open-source [Arch Gateway](https://github.com/katanemo/arch), enabling seamless, preference-based prompt routing in multi-LLM systems.
+# Requirements
+The code of Arch-Router-1.5B has been in the Hugging Face `transformers` library and we advise you to install latest version:
+```bash
+pip install transformers>=4.37.0
+```
+# How to use
+We use the following example to illustrate how to use our model to perform routing tasks. Please note that, our model works best with our provided prompt format.
+### Quickstart
+````python
+import json
+from typing import Any, Dict, List
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_name = "katanemo/Arch-Router-1.5B"
+model = AutoModelForCausalLM.from_pretrained(
+    model_name, device_map="auto", torch_dtype="auto", trust_remote_code=True
+)
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+# Please use our provided prompt for best performance
+TASK_INSTRUCTION = """
+You are a helpful assistant designed to find the best suited route.
+You are provided with route description within <routes></routes> XML tags:
+<routes>
+\n{routes}\n
+</routes>
+<conversation>
+\n{conversation}\n
+</conversation>
+"""
+FORMAT_PROMPT = """
+Your task is to decide which route is best suit with user intent on the conversation in <conversation></conversation> XML tags.  Follow the instruction:
+1. If the latest intent from user is irrelevant or user intent is full filled, response with other route {"route": "other"}.
+2. You must analyze the route descriptions and find the best match route for user latest intent.
+3. You only response the name of the route that best matches the user's request, use the exact name in the <routes></routes>.
+Based on your analysis, provide your response in the following JSON formats if you decide to match any route:
+{"route": "route_name"}
+"""
+# Define route config
+route_config = [
+    {
+        "name": "code_generation",
+        "description": "Generating new code snippets, functions, or boilerplate based on user prompts or requirements",
+    },
+    {
+        "name": "bug_fixing",
+        "description": "Identifying and fixing errors or bugs in the provided code across different programming languages",
+    },
+    {
+        "name": "performance_optimization",
+        "description": "Suggesting improvements to make code more efficient, readable, or scalable",
+    },
+    {
+        "name": "api_help",
+        "description": "Assisting with understanding or integrating external APIs and libraries",
+    },
+    {
+        "name": "programming",
+        "description": "Answering general programming questions, theory, or best practices",
+    },
+]
+# Helper function to create the system prompt for our model
+def format_prompt(
+    route_config: List[Dict[str, Any]], conversation: List[Dict[str, Any]]
+):
+    return (
+        TASK_INSTRUCTION.format(
+            routes=json.dumps(route_config), conversation=json.dumps(conversation)
+        )
+        + FORMAT_PROMPT
+    )
+# Define conversations
+conversation = [
+    {
+        "role": "user",
+        "content": "fix this module 'torch.utils._pytree' has no attribute 'register_pytree_node'. did you mean: '_register_pytree_node'?",
+    }
+]
+route_prompt = format_prompt(route_config, conversation)
+messages = [
+    {"role": "user", "content": route_prompt},
+]
+input_ids = tokenizer.apply_chat_template(
+    messages, add_generation_prompt=True, return_tensors="pt"
+).to(model.device)
+# 2. Generate
+generated_ids = model.generate(
+    input_ids=input_ids,  # or just positional: model.generate(input_ids, …)
+    max_new_tokens=32768,
+)
+# 3. Strip the prompt from each sequence
+prompt_lengths = input_ids.shape[1]  # same length for every row here
+generated_only = [
+    output_ids[prompt_lengths:]  # slice off the prompt tokens
+    for output_ids in generated_ids
+]
+# 4. Decode if you want text
+response = tokenizer.batch_decode(generated_only, skip_special_tokens=True)[0]
+print(response)
+````
+Then you should be able to see the following output string in JSON format:
+````python
+{"route": "bug_fixing"}
+````
+To better understand how to create the route descriptions, please take a look at our [Katanemo API](https://docs.archgw.com/guides/llm_router.html).
+# License
+Katanemo Arch-Router model is distributed under the [Katanemo license](https://huggingface.co/katanemo/Arch-Router-1.5B.gguf/blob/main/LICENSE).