Samastam Instruct (v1)

This is an intruct variant of the Sarvam-1 model. It is currently finetuned using the Alpaca-cleaned dataset, and several Hindi, Kannada, and Bengali datasets.

Samastam responds to instructions fairly well at this point, but I'll probably continue to finetune it with more datasets in other Indic languages.

Usage Example

from transformers import pipeline
import torch

def format_prompt(user_input: str) -> str:
    template = (        
        "### Instruction:\n"
        f"{user_input}\n\n"
        "### Response:\n"
    )
    return template

pipe = pipeline(
    "text-generation",
    model="hathibelagal/samastam-it-v1",
    torch_dtype=torch.float16,
    device_map="auto",
)

output = pipe(
    format_prompt("ಪ್ರೀತಿ ಎಂದರೇನು?"),
    pad_token_id=pipe.tokenizer.eos_token_id,
    max_new_tokens=25,
    do_sample=True,
)

print(output[0]["generated_text"])

# Output:
# ### Instruction:
# ಪ್ರೀತಿ ಎಂದರೇನು?
#
# ### Response:
# ಒಂದು ಭಾವನೆ, ಒಂದು ಸ್ಥಿತಿ.

Usage Example - GGUF

Expect reduced accuracy with Q_8. Here's how to use it with llama-cpp-python:

from huggingface_hub import hf_hub_download
from llama_cpp import Llama

gguf_file = "Samastam-2.5B-Q8_0.gguf"
hf_hub_download(
    repo_id="hathibelagal/samastam-it-v1",
    filename=gguf_file,
    local_dir=".",
)

def format_prompt(user_input: str) -> str:
    template = (        
        "### Instruction:\n"
        f"{user_input}\n\n"
        "### Response:\n"
    )
    return template

llm = Llama(model_path=gguf_file)
output = llm(
    format_prompt("ಹಸುಗಳು ಏನು ತಿನ್ನುತ್ತವೆ?"),
    max_tokens=100, temperature=0.7
)
print(output["choices"][0]["text"])

# Output
# ಅವು ಹುಲ್ಲು ತಿನ್ನುತ್ತವೆ.

hathibelagal
/

samastam-it-v1

Samastam Instruct (v1)

Usage Example

Usage Example - GGUF

Model tree for hathibelagal/samastam-it-v1

Dataset used to train hathibelagal/samastam-it-v1