llama.cpp๋ฅผ ์‚ฌ์šฉํ•ด gguf๋กœ ๋ณ€ํ™˜ํ–ˆ์Šต๋‹ˆ๋‹ค.

from llama_cpp import Llama

llm = Llama(
    model_path="HyperCLOVAX-SEED-Text-Instruct-1.5B-gguf-q8_0.gguf",
    n_gpu_layers=-1,
    main_gpu=0,
    n_ctx=2048
)

output = llm(
    "์žฌ๋ฏธ์žˆ๋Š” ์ด์•ผ๊ธฐ ํ•˜๋‚˜ ๋งŒ๋“ค์–ด์ค˜. 1000์ž ์ด์ƒ์ด์–ด์•ผ ํ•ด. ์‹œ์ž‘:", # Prompt
    max_tokens=2048,
    echo=True,

)
print(output)

geforce 3070 RTX๋กœ ํ…Œ์ŠคํŠธํ–ˆ์œผ๋ฉฐ, ์„ฑ๋Šฅ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

q8_0, peak: 1.8G
llama_perf_context_print:        load time =     186.23 ms
llama_perf_context_print: prompt eval time =     186.15 ms /    19 tokens (    9.80 ms per token,   102.07 tokens per second)
llama_perf_context_print:        eval time =    3141.36 ms /   536 runs   (    5.86 ms per token,   170.63 tokens per second)
llama_perf_context_print:       total time =    3758.06 ms /   555 tokens
Downloads last month
12
GGUF
Model size
1.59B params
Architecture
llama
Hardware compatibility
Log In to view the estimation

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for sorryhyun/HyperCLOVAX-SEED-Text-Instruct-1.5B-gguf-q8_0

Quantized
(9)
this model