llama.cpp๋ฅผ ์‚ฌ์šฉํ•ด gguf๋กœ ๋ณ€ํ™˜ํ–ˆ์Šต๋‹ˆ๋‹ค.

from llama_cpp import Llama

llm = Llama(
    model_path="HyperCLOVAX-SEED-Text-Instruct-1.5B-gguf-bf16.gguf",
    n_gpu_layers=-1,
    main_gpu=0,
    n_ctx=2048
)

output = llm(
    "์žฌ๋ฏธ์žˆ๋Š” ์ด์•ผ๊ธฐ ํ•˜๋‚˜ ๋งŒ๋“ค์–ด์ค˜. 1000์ž ์ด์ƒ์ด์–ด์•ผ ํ•ด. ์‹œ์ž‘:", # Prompt
    max_tokens=2048,
    echo=True,

)
print(output)

geforce 3070 RTX๋กœ ํ…Œ์ŠคํŠธํ–ˆ์œผ๋ฉฐ, ์„ฑ๋Šฅ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

bf16, peak: 4GB
llama_perf_context_print:        load time =     210.50 ms
llama_perf_context_print: prompt eval time =     210.42 ms /    19 tokens (   11.07 ms per token,    90.30 tokens per second)
llama_perf_context_print:        eval time =   17923.17 ms /  2028 runs   (    8.84 ms per token,   113.15 tokens per second)
llama_perf_context_print:       total time =   21307.79 ms /  2047 tokens
Downloads last month
6
GGUF
Model size
1.59B params
Architecture
llama
Hardware compatibility
Log In to view the estimation

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for sorryhyun/HyperCLOVAX-SEED-Text-Instruct-1.5B-gguf-bf16

Quantized
(9)
this model