This is a version of the basic HCX-SEED Vision 3B with the vision encoder removed, llamafied, and a function call prompt (Hermes) added. The weights are not modified.

vllm serve minpeter/HCX-SEED-FC-3B \
  --enforce-eager --port 4000 --served-model-name base \
  --enable-auto-tool-choice --tool-call-parser llama_hermes --tool-parser-plugin lh_tool_parser.py

bfcl generate --model base --test-category simple,parallel,multiple,parallel_multiple,irrelevance,multi_turn_base --num-threads 30 --allow-overwrite --exclude-state-log
bfcl evaluate --model base --test-category simple,parallel,multiple,parallel_multiple,irrelevance,multi_turn_base

Comparison Table of Test Results by Model (BFCL)

🦍 Model: base
πŸ” Running test: irrelevance
βœ… Test completed: irrelevance. 🎯 Accuracy: 0.43333333333333335
πŸ” Running test: multi_turn_base
βœ… Test completed: multi_turn_base. 🎯 Accuracy: 0.01
πŸ” Running test: parallel_multiple
βœ… Test completed: parallel_multiple. 🎯 Accuracy: 0.475
πŸ” Running test: parallel
βœ… Test completed: parallel. 🎯 Accuracy: 0.52
πŸ” Running test: simple
βœ… Test completed: simple. 🎯 Accuracy: 0.7575
πŸ” Running test: multiple
βœ… Test completed: multiple. 🎯 Accuracy: 0.765

image/png

Overview

HyperCLOVA-X-SEED-Vision-Instruct-3B-Llamafied is based on a model developed by NAVER that can understand and generate text. It demonstrates competitive performance on major benchmarks related to Korean language and culture. In addition, it supports a context length of up to 16k tokens, enabling it to handle a wide range of tasks.

Basic Information

  • Model Architecture: Transformer-based architecture (Dense Model)
  • Number of Parameters: 3.26B
  • Input/Output Format: Text / Text (both input and output are in text format)
  • Context Length: 16k
  • Knowledge Cutoff Date: The model was trained on data prior to August 2024.

Training and Data

The training data for HyperCLOVA-X-SEED-Vision-Instruct-3B-Llamafied consists of diverse sources, including high-quality datasets. The training process was carried out in four main stages: Pretraining Stage 1, where the model learns from a large volume of documents; Pretraining Stage 2, which focuses on additional training with high-quality data; Rejection sampling Fine-Tuning (RFT), aimed at enhancing the model’s knowledge across various domains and its complex reasoning abilities; and Supervised Fine-Tuning (SFT), which improves the model’s instruction-following capabilities. Furthermore, due to the characteristics of smaller models, vulnerability to long-context handling was observed. To address this, reinforcement for long-context understanding was incorporated from the pretraining stages through to the SFT stage, enabling the model to stably support context lengths of up to 16k tokens.

Huggingface Usage Example

from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("/path/to/ckpt")
tokenizer = AutoTokenizer.from_pretrained("/path/to/ckpt")

chat = [
  {"role": "system", "content": "- AI μ–Έμ–΄λͺ¨λΈμ˜ 이름은 \"CLOVA X\" 이며 λ„€μ΄λ²„μ—μ„œ λ§Œλ“€μ—ˆλ‹€.\n- μ˜€λŠ˜μ€ 2025λ…„ 04μ›” 24일(λͺ©)이닀."},
  {"role": "user", "content": "μŠˆλ’°λ”©κ±° 방정식과 μ–‘μžμ—­ν•™μ˜ 관계λ₯Ό μ΅œλŒ€ν•œ μžμ„Ένžˆ μ•Œλ €μ€˜."},
]

inputs = tokenizer.apply_chat_template(chat, add_generation_prompt=True, return_dict=True, return_tensors="pt")
output_ids = model.generate(**inputs, max_length=1024, stop_strings=["<|endofturn|>", "<|stop|>"], tokenizer=tokenizer)
print(tokenizer.batch_decode(output_ids))
Downloads last month
89
Safetensors
Model size
3.26B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for minpeter/HCX-SEED-FC-3B